Select Page

How to extract content from protected PDF

How to extract content from protected PDF

pdf-iconSome PDFs on the internet have a copy protection to make sure you cannot copy-paste any content from the PDF into a document you’re writing. Defeating this protection is very easy as you will see in this post.

I will use a combination of Open Source tools to extract the content of a protected PDF..

 

 

 

 

This is how a protected PDF look like in Adobe Acrobat under File – Properties

 password.protected.pdf.copy.text 

You will need to obtain GhostScript

Ghostscript is an interpreter for the PostScript language and for PDF, and related software and documentation.

So run the self-extracting EXE from http://pages.cs.wisc.edu/~ghost/doc/GPL/gpl871.htm to install the engine

gs871w32.exe, GPL Ghostscript 8.71 for 32-bit Windows (the common variety).
gs871w64.exe, GPL Ghostscript 8.71 for 64-bit Windows (x86_64).

Now install the viewer from http://pages.cs.wisc.edu/~ghost/gsview/get49.htm 

gsv49w32.exe Win32 self extracting archive
gsv49w64.exe Win64 (x86_64) self extracting archive

password.protected.pdf.copy.gsview

Then start Gsview and Open the PDF, you can either convert it to PS (Postscript) and you’ll be able to edit it like any other document or under the menu  Edit – text extract you’ll be able to save the context in a Text file. Enjoy 🙂

About The Author

I worked with various Insurances companies across Switzerland on online applications handling billion premium volumes. I love to continuously spark my creativity in many different and challenging open-source projects fueled by my great passion for innovation and blockchain technology.In my technical role as a senior software engineer and Blockchain consultant, I help to define and implement innovative solutions in the scope of both blockchain and traditional products, solutions, and services. I can support the full spectrum of software development activities, starting from analyzing ideas and business cases and up to the production deployment of the solutions.I'm the Founder and CEO of Disruptr GmbH.

Categories