
How to extract content from protected PDF
Some PDFs on the internet have a copy protection to make sure you cannot copy-paste any content from the PDF into a document you’re writing. Defeating this protection is very easy as you will see in this post.
I will use a combination of Open Source tools to extract the content of a protected PDF..
This is how a protected PDF look like in Adobe Acrobat under File – Properties
You will need to obtain GhostScript
Ghostscript is an interpreter for the PostScript language and for PDF, and related software and documentation.
So run the self-extracting EXE from http://pages.cs.wisc.edu/~ghost/doc/GPL/gpl871.htm to install the engine
gs871w32.exe, GPL Ghostscript 8.71 for 32-bit Windows (the common variety).
gs871w64.exe, GPL Ghostscript 8.71 for 64-bit Windows (x86_64).
Now install the viewer from http://pages.cs.wisc.edu/~ghost/gsview/get49.htm
gsv49w32.exe Win32 self extracting archive
gsv49w64.exe Win64 (x86_64) self extracting archive
Then start Gsview and Open the PDF, you can either convert it to PS (Postscript) and you’ll be able to edit it like any other document or under the menu Edit – text extract you’ll be able to save the context in a Text file. Enjoy 🙂