Portable Document Format (PDF) is a file format used to present documents in a manner independent of application software, hardware, and operating systems. read more at WikiPedia

  • PDFCreator easily creates PDFs from any Windows program. Use it like a printer in Word, StarCalc or any other Windows application. In fact all you have to do is to print to a special device: a sort of printer which create file (PDF) on your disk
    • Development Status: 5 - Production/Stable
    • Intended Audience: End Users/Desktop
    • License: GNU General Public License (GPL)
    • Operating System: 32-bit MS Windows (95/98), All 32-bit MS Windows (95/98/NT/2000/XP)
    • Programming Language: Visual Basic
    • Topic: Office Suites, Printing
    • Translations: English, German
    • User Interface: Win32 (MS Windows)
  • google_logo

    Google’s Jeff Dean was one of the keynote speakers at an ACM workshop on large-scale computing systems, and discussed some of the technical details of the company’s mighty infrastructure, which is spread across dozens of data centers around the world. His presentation give some insight about what’s going on at Google, and how they have found innovative solutions to meet their never ending quest of speed and bandwidth usage. All their figures have impressed me a lot!

    You will learn some of their in house technologies, aka

    • Google File System (GFS): a scalable distributed file system for large distributed data-intensive applications.
    • Map Reduce is a software framework introduced by Google to support distributed computing on large data sets on clusters of computers [WikiPedia],&160; see Hadoop project for a free open source Java MapReduce implementation.
    • BigTable is a compressed, high performance, and proprietary database system built on Google File System (GFS), see Hadoop HBase project for something similar.
    • Their new project: Spanner which will be responsible for Storage & computation system to spans all over their datacenters.

    Read now this great document online http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf (if it disappear, ask me for a copy)

  • Downloading resources on Android devices returns unknown file in Google Chrome, or internal browser but not in Firefox for Android!

    Short version

    • Do not rely on self signed certificate for android when downloading resources: android download manager wont work (below Android 4.1.4 SSL was even not supported in download manager)
    • Android do not support all kind of SSL Cipher, check the compatibility table below

    Long Story

    On some Android devices clicking the download link return back an error and show an 'Unknown file'. The file of an initial size of 790kb get partially and randomly downloaded: sometimes you get 140kb, sometimes 224kb or more.

    There is a workaround: if one lets the cursor on the link and clicks 'Save' then the saved document is correct and can be opened.

    This issue appear on some Android phone, not on Android tablet (???) and never on iOS (sic)

    Looking  at the logs, we have found that In Apache access log the resource-size returned is not the same as in Tomcat access log (only when client is Android). Using Desktop class browser (Google Chrome, Firefox, Opera, Safari) the sizes returned by Tomcat and Apache is the same!

    After  a lot of try and error we found out that Android is able to download properly the resource when connecting directly to tomcat (e.g. without SSL), however in this case there is a VERY strange behaviour:

    So, when we try to download the resource via HTTP, android needs to connect twice! The first connection seems to abort and only the second connection (Android download manager) is able to fetch everything. 

    After that, we enabled the debug logging in Apache and had look at the output.

    [Tue Jan 26 16:06:29 2016] [info] Initial (No.1) HTTPS request received for child 0 (server skye3.innoveo.com:443)
    [Tue Jan 26 16:06:29 2016] [debug] mod_proxy_http.c(56): proxy: HTTP: canonicalising URL //localhost:8443/xx.pdf
    [Tue Jan 26 16:06:29 2016] [debug] proxy_util.c(1525): [client] proxy: http: found worker http://localhost:8443/ for http://localhost:8443/xx.pdf
    [Tue Jan 26 16:06:29 2016] [debug] mod_proxy.c(1026): Running scheme http handler (attempt 0) [Tue Jan 26 16:06:29 2016] [debug] mod_proxy_http.c(1982): proxy: HTTP: serving URL http://localhost:8443/xx.pdf [Tue Jan 26 16:06:29 2016] [debug] proxy_util.c(2102): proxy: HTTP: has acquired connection for (localhost) [Tue Jan 26 16:06:29 2016] [debug] proxy_util.c(2158): proxy: connecting http://localhost:8443/xx.pdf to localhost:8443 [Tue Jan 26 16:06:29 2016] [debug] proxy_util.c(2285): proxy: connected /xxxxx.pdf to localhost:8443 [Tue Jan 26 16:06:29 2016] [debug] mod_proxy_http.c(1741): proxy: start body send [Tue Jan 26 16:06:29 2016] [info] [client] (104)Connection reset by peer: core_output_filter: writing data to the network [Tue Jan 26 16:06:29 2016] [info] [client] (103)Software caused connection abort: SSL output filter write failed. [Tue Jan 26 16:06:29 2016] [debug] mod_proxy_http.c(1851): proxy: end body send [Tue Jan 26 16:06:29 2016] [debug] proxy_util.c(2120): proxy: HTTP: has released connection for (localhost) [Tue Jan 26 16:06:29 2016] [info] [client] Connection to child 3 established (server skye3.innoveo.com:443) ... ~removed useless debug output~ [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1966): [client] SSL virtual host for servername skye3.innoveo.com found [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 read client hello A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 write server hello A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 write certificate A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 write key exchange A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 write server done A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 flush data Tue Jan 26 16:06:29 2016] [debug] ssl_engine_io.c(1929): OpenSSL: read 5/5 bytes from BIO7f1a4c1230d0 [mem: 7f1a4c17a493] (BIO dump follows) ... ~removed useless debug output~ [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 read client key exchange A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_io.c(1929): OpenSSL: read 5/5 bytes from BIO7f1a4c1230d0 [mem: 7f1a4c17a493] (BIO dump follows) ... ~removed useless debug output~ [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 read finished A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 write session ticket A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 write change cipher spec A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 write finished A [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1853): OpenSSL: Loop: SSLv3 flush data [Tue Jan 26 16:06:29 2016] [debug] ssl_engine_kernel.c(1849): OpenSSL: Handshake: done [Tue Jan 26 16:06:29 2016] [info] Connection: Client IP:, Protocol: TLSv1.2, Cipher: ECDHE-RSA-AES256-SHA (256/256 bits) [Tue Jan 26 16:06:29 2016] [info] [client] (70014)End of file found: SSL input filter read failed. [Tue Jan 26 16:06:29 2016] [info] [client] Connection closed to child 2 with standard shutdown (server skye3.innoveo.com:443) [Tue Jan 26 16:06:29 2016] [info] [client] Connection to child 0 established (server skye3.innoveo.com:443) ...

    So we see, the intial SSL connect works, we can see the request issued and the proxy request. Body is written and then "connection reset by peer"

    After careful search it is pretty sure that we are running into this problem: https://code.google.com/p/chromium/issues/detail?id=440951
    when you try to download stuff with chromium it works (even from unsecure sources), this is why the first connect is okay. however chromium interrupts the download to hand it over to android download manager (this is why actually displaying pictures works, despite the fact that they are delivered though the same pipeline, e.g. skye code, tomcat version, apache and ssl). this is also why we see two downloads per click in the log files. Problem is however that android download manager does NOT NEVER EVER download stuff from unsecure sources (e.g. selfsigned certs) and thus the final download fails. this is also true for the default andoid browser, because they also use the android download manager.

    Solution: the only solution was be to upgrade to valid SSL  certificates (Verizon, Verisign or any other) instead of self signed. This increase the number of Android device working but unfortunately  some Android devices were still NOT able to download resources with a valid SSL cert...

    By using the Android SDK debug console (adb.exe logcat > file.txt) of android, we saw the following:

    	Line 7487: D/DownloadManager( 3054): [1] Starting
    	Line 7489: W/DownloadManager( 3054): [1] Stop requested with status HTTP_DATA_ERROR: Handshake failed
    	Line 7491: D/DownloadManager( 3054): [1] Finished with status WAITING_TO_RETRY

    This show again that the initial connect to our server happen correctly but return partial content but is then forwarded to the download manager that try to build another connection that is still fail

    Solution: change Apache cipher suite according to the table below.

    Android compatibility table


    Depending on which version of android you would like to support you'll  have to find a cipher suite that is supported by iOS, Android while not sacrificing too much security. 

    Android Version Released API Level Name Build Version Code
    Android 6.0 August 2015 23 Marshmallow Android.OS.BuildVersionCodes.Marshmallow
    Android 5.1 March 2015 22 Lollipop Android.OS.BuildVersionCodes.LollipopMr1
    Android 5.0 November 2014 21 Lollipop Android.OS.BuildVersionCodes.Lollipop
    Android 4.4W June 2014 20 Kitkat Watch Android.OS.BuildVersionCodes.KitKatWatch
    Android 4.4 October 2013 19 Kitkat Android.OS.BuildVersionCodes.KitKat
    Android 4.3 July 2013 18 Jelly Bean Android.OS.BuildVersionCodes.JellyBeanMr2
    Android 4.2-4.2.2 November 2012 17 Jelly Bean Android.OS.BuildVersionCodes.JellyBeanMr1
    Android 4.1-4.1.1 June 2012 16 Jelly Bean Android.OS.BuildVersionCodes.JellyBean
    Android 4.0.3-4.0.4 December 2011 15 Ice Cream Sandwich Android.OS.BuildVersionCodes.IceCreamSandwichMr1
    Android 4.0-4.0.2 October 2011 14 Ice Cream Sandwich Android.OS.BuildVersionCodes.IceCreamSandwich
    Android 3.2 June 2011 13 Honeycomb Android.OS.BuildVersionCodes.HoneyCombMr2
    Android 3.1.x May 2011 12 Honeycomb Android.OS.BuildVersionCodes.HoneyCombMr1
    Android 3.0.x February 2011 11 Honeycomb Android.OS.BuildVersionCodes.HoneyComb
    Android 2.3.3-2.3.4 February 2011 10 Gingerbread Android.OS.BuildVersionCodes.GingerBreadMr1
    Android 2.3-2.3.2 November 2010 9 Gingerbread Android.OS.BuildVersionCodes.GingerBread
    Android 2.2.x June 2010 8 Froyo Android.OS.BuildVersionCodes.Froyo
    Android 2.1.x January 2010 7 Eclair Android.OS.BuildVersionCodes.EclairMr1
    Android 2.0.1 December 2009 6 Eclair Android.OS.BuildVersionCodes.Eclair01
    Android 2.0 November 2009 5 Eclair Android.OS.BuildVersionCodes.Eclair
    Android 1.6 September 2009 4 Donut Android.OS.BuildVersionCodes.Donut
    Android 1.5 May 2009 3 Cupcake Android.OS.BuildVersionCodes.Cupcake
    Android 1.1 February 2009 2 Base Android.OS.BuildVersionCodes.Base11
    Android 1.0 October 2008 1 Base Android.OS.BuildVersionCodes.Base

    It is always a good idea to validate your SSL settings by using one the these online services (In no particular order). Some even report if you are vulnerable to some common SSL attacks ()

  • pdf-iconSome PDFs on the internet have a copy protection to make sure you cannot copy-paste any content from the PDF into a document you're writing. Defeating this protection is very easy as you will see in this post.

    I will use a combination of Open Source tools to extract the content of a protected PDF..





    This is how a protected PDF look like in Adobe Acrobat under File - Properties


    You will need to obtain GhostScript

    Ghostscript is an interpreter for the PostScript language and for PDF, and related software and documentation.

    So run the self-extracting EXE from http://pages.cs.wisc.edu/~ghost/doc/GPL/gpl871.htm to install the engine

    gs871w32.exe, GPL Ghostscript 8.71 for 32-bit Windows (the common variety).
    gs871w64.exe, GPL Ghostscript 8.71 for 64-bit Windows (x86_64).

    Now install the viewer from http://pages.cs.wisc.edu/~ghost/gsview/get49.htm 

    gsv49w32.exeWin32 self extracting archive
    gsv49w64.exeWin64 (x86_64) self extracting archive


    Then start Gsview and Open the PDF, you can either convert it to PS (Postscript) and you’ll be able to edit it like any other document or under the menu  Edit - text extract you’ll be able to save the context in a Text file. Enjoy :-)

  •  iText is a library that allows you to generate PDF files on the fly. The iText classes are very useful for people who need to generate read-only, platform independent documents containing text, lists, tables and images. The library is especially useful in combination with Java(TM) technology-based Servlets: The look and feel of HTML is browser dependent; with iText and PDF you can control exactly how your servlet's output will look.
    iText requires JDK 1.2. It's available for free under a multiple license: MPL and LGPL. It also have a complete list of features (but outdated), and let you perform operation on already created pdf (concat, split, add pages, add an overlay and so on...) 

  •  apache_maven

    The last year, I was at Jazoon 08, and I forget to tell you how good some of their presentation about Maven were

    Let the Continuous Build Embrace Your Database

    "JUnit tests should not depend on database state." - "Set up your test data before you run your test." - We figure this just does not always scale. Mocking test data for hundreds of tables may not be suitable and database schemes evolve as the application does.
    These are common challenges when developing large J2EE systems. This presentation shows practical approaches for project setups and (functional) tests that decided to depend on a database. Developers and build servers may or may not share the database and support for this should be as simple as possible. We give an overview of what proved to be a good setup for an Eclipse / Maven 2 based development and build environment that relies on an evolving relational schema.

    Read More Here 

    The PDF cannot be downloaded, fortunately  I‘ve made a backup just in case 2 years ago. I did upload the presentation at SlideShare

    Here is the mind map I’ve done during the presentation

    • Continuous build for DB
      • db changes
        • SQL script patches
        • changes in chema
        • different db state for  each trunk tag branches?
        • = hell of synchronization issues
        • they put script in SVN
        • only run modified scripts between each or last build
        • run SQL script against references db before pushing the same changes to prod
        • ex: developer commit, build server poll SVNand launch build, then propagate
        • they use continuum
      • they have made a framework that has some tables more to keep which files .SQL has run
      • and what .sql revision svn it was
      • so they can only run delta scripts
      • ex: version 1.0 in prod, but bug appear
        • -> open a branch
        • -> automatic run of branch sql scripts also to trunk
      • Idempotent
        • but the same script apply twice on different database status do not gibe the same result
          • so they have to make script idempotent by checking/handling all previous versions
        • views ad trigger can be Idempotent easily
      • they have DB quality checks
        • primary keys constraints
        • foreign keys
        • etc..
    • fightning bugs
      • not breaking sql scripts
      • no regressions
    • rerunnable junit functional tests
      • auto rollback junit class
        • their own impl of datasource
          • and connection
      • don’t expect developer to properly rollback called in teardown
      • extends autorollbackjunittestcase.class
      • autorollbacktestcase also existing in spring see spring-test.jar
    • eclipse maven setup
      • for junit tests
      • read junit.properties
      • if any junit-fritz.properties  exist it will use the user config file
        • good idea
        • the file will e committed but wont break continuum build server
      • multi modules
        • different classpath (test and main) between eclipse and maven
        • they use propertes in pom.xml and  variable in properties
          • -> filter
    • done by teslekurs
      • they have 70 modules
      • netcetera.ch
    • make a try
      • go to workspace in dos
        • run in pk common "mvn clean test" it should build common like in teamcity
      • Use spring test framework of spring 2.5
    • outlook
      • only oracle
      • they search good test data among their 1TB data
      • want to use maven in also in eclipse, they use the command line right now
    • ideas
      • they store the script they have run to create the database and their SVN revision in db

        someone in room has propose to keep the data in build and add a column to know if data was created by Junit or by the main code


    Database with junit

  • Another lock-in of users  format full of promise by Microsoft: Metro
    Metro will be based on XML (eXtensible Markup Language), an open standard widely used on the internet.
    read more here:
    Metro is basically M$ response to the widely use Adobe PDF (Portable Document File) file format...There is some articles which try to explain why M$ is now trying to control PDF:  Microsoft gunning for Adobe's PDF format? and  Microsoft Metro Threatens Adobe Acrobat, a very good article on what Metro may change in our lives.

    Metro Specification and Reference Guide is available on M$ HERE

    Note: If You ever want to create, read, modifiy PDF using Java, there is a huge list of open source libraries and tools list HERE

  • apache_maven

    The JBoss division of Red Hat develops several large open source Java applications. These include the JBoss application server, Hibernate, Seam, and several others. These applications primarily used Ant for builds, tests, releases and other parts of the project life cycle. As the size of these projects increased, several problems were experienced with the build system. The builds became difficult to maintain for current developers and difficult to understand and use for new developers. Managing the various project dependencies also became more difficult as dependency versions were changed, source code was moved, etc. There was also a lack of consistency from one project to the next. Since there were only minimal standards in place, the build scripts of the projects would be very different from each other. These and other issues led to the decision to migrate from Ant to Maven. Read More HERE

  • apache_maven

    The last year, I was at Jazoon 08, and I forget to tell you how good some of their presentation about Maven were

    Module-based development with Spring and Maven 2

    Modularity belongs to the basic architectural best practices. Splitting your application in separate modules can reduce undesired coupling as well as lead to high cohesion, reduce complexity, simplify team development, and decrease execution size by using only the required modules. This talks presents how we combine existing technologies (Spring and Maven 2) to get a seamless module support for the development, test and runtime of Java applications. Maven is concerned with build time aspects, Spring is concerned with run time aspects and there are some shared aspects that concern both. A developer can thus define in his module what the module shall do during development, test and runtime. This leads to better separation of concerns as each module can focus on its aspects and is less bothered by aspects of other modules. We argue that to realize the full potential, only combining plain Maven 2 and Spring is not sufficient and discuss what to add. The module support is implemented and freely available in the open source EL4J project (http://EL4J.sf.net). Read More Here

  • apache_maven

    The last year, I was at Jazoon 08, and I forget to tell you how good some of their presentation about Maven were



    Technology management with Maven

    The list of dependencies in a project of a certain size can be very long. New frameworks and libraries emerge at a fast pace, and they often affect that list. Developers and managers have to keep track of dependencies by maintaining dependency repositories, and they have to ensure that the accumulated knowledge is always available in an easily accessible and distributable form.Maven provides effective mechanisms to cope with the breadth of such dependencies. Also, there are tools that help manage artifact repositories in dealing with the information overload often associated with repositories. However, Maven does not support the concept of technology lifecycle (not to be confused with the Maven build lifecycle), which implies that it does not support technology lifecycle handling and storing of knowledge about the quality of a dependency. Read More Here