Text in PDF files

Text in PDF files can either be compressed or uncompressed. The compression algorithm used for text is LZW. If text has been compressed, you can no longer read it if the PDF file is opened in an editor or word processor. LZW compression works by replacing frequently used data (like the word ‘the’ in a text file) by only one character. It usually gives a compression ratio of about 2 to 1.

How to edit text in a PDF

Acrobat Professional has a “TouchUp Text” which allows you to make small changes to the text in a PDF document. Unfortunately text is stored line by line in a PDF. This means that a PDF file is not ‘aware’ of the way text flows in a document. If you use Acrobat to add a few words to a line of text, words won’t overflow to the next line.
If you want to make big text changes in a PDF and absolutely need text reflows, look at Infix PDF Editor, a Windows stand-alone program that offers powerful text editing capabilities for PDF files.

How to extract text from a PDF

Since I have never needed to do this much, I simply use Acrobat to select the text that I need and then COPY it so that I can PASTE it in a text document. If you select a couple of lines from a two column lay-out, you’ll see that text will be selected across the two columns. Fortunately, if you COPY such text and PASTE it into a document, you will get the two columns one after the other.
There are however numerous tools available to extract the text of a PDF file which are more efficient than my manual work-around.

9 August 2013

