The PDF file format

The average user never needs to look ‘under the hood’ of PDF files. For curious people, this page takes a closer look at the way information is stored in a PDF file.

General conventions

Here is some useful information in case you intend to open PDF-files to edit them straightaway:

  • PDF files are 7-bit ASCII text files. They can be opened in any editor or word processor like Notepad. Depending on compression and encryption settings, the file may or may not be readable text.
  • Every line in a PDF can contain up to 255 characters.
  • Every line ends with a carriage return, a line feed or a carriage return followed by a line feed (depending upon the application or platform used to create the PDF file).
  • PDF is case sensitive.

File structure

PDF files use a fixed structure, they always contain 4 sections:

  • A header, which contains information on the PDF-specifications the file adheres to. This line looks like this: ‘%PDF-1.2′. The ‘1.2′ can also be ‘1.0′ or ‘1.1′ for older versions of the PDF standard.
  • The body area which contains a descriptions of the various elements that are used in all of the pages.
  • A cross-reference table which refers to all the elements from the body that are used on the pages of the PDF-file.
  • A trailer which tells applications or RIPs where to find the cross-reference table and always ends with ‘%%EOF’. If this line is missing, the PDF-file is not complete and can probably not be processed by any RIP or application. This is not the case withPostScript files. If the last few lines of a PostScript file are missing (because of a lost connection while transferring the file or a computer crash) you can often still print most of the pages. With a PDF-file, you’ll lose everything.

Modifying data

If data are appended to a PDF-file (for instance because the user edited text in Adobe Acrobat and saved the file again), another body area, cross-reference table and trailer are added to the end of the file. If you select the ‘Optimize’ button in the ‘Save as’ menu, Exchange will clean up the PDF file so there are no more multiple data areas.

Add a Comment