Compression in PDF files

PDF files are much smaller than equivalent PostScript files. This is achieved through a better datastructure but it is mainly due to the very efficient compression algorithms that PDF supports.

All text and the PostScript operators that are part of a PDF can be compressed using an LZW algorithm. This basic compression can reduce the file size of a PDF to about half the size of an equivalent PostScript file.

If compression is switched on in Acrobat Distiller, Distiller will first decompress all images in a PostScript file and then recompress them while creating a PDF file.

For images various compression algorithms can be used in PDF files:

  • JPEG & JPEG2000
  • ZIP
  • CCITT G3/G4
  • RLE
  • Flate
  • JBIG2

Most of these compression algorithms are discussed in more detail in the data compression section of this site.

How to check the compression that was used in a PDF

In some cases you can open the PDF using a text editor which can handle binary data (TextPad, UltraEdit,…) and search for the “/Filter” keywords.

JPEG compression

JPEG compression is used for colour and grayscale images. It is a compression algorithm that can be both lossy and non-lossy. In Acrobat, only lossy JPEG-compression is available. This means that some of the detail of the image is lost by compressing it. The better the compression ratio, the more detail you lose.

More information on the JPEG compression algorithm can be found on this page.

From Acrobat Distiller 4 onwards, there are 5 different levels of compression:

  • Minimum, with a quality loss that will be acceptable for everything but the most demanding jobs. Average compression ration: 1/2
  • Low
  • Medium, acceptable for low quality work. Average compression ratio: 1/5
  • High
  • Maximum: not acceptable for prepress any more. Average compression ratio: 1/10

Don’t compress images that have already been compressed using JPEG, this will lead to additional loss of information. If you distill a file that contains JPEG compressed images, Distiller will uncompress these and then recompress them according to your settings. This will lead to an additional loss in image quality.

Once a PDF is created, you can still alter the compression ratio. This can be handy when a file is too big for e-mailing or uploading it. Remember the disadvantage: when you recompress the data, there is additional loss of detail! If possible, redistill the original source file.

  • In Acrobat Professional 7 & later, there is a menu option called PDF Optimizer which allows you to recompress all data in the PDF.
  • There are a number of Acrobat plug-ins that can recompress data. I particularly like Quite-a-box-of-tricks from Quite software but there are others available.

JPEG2000 compression

This new compression algorithm is supported from PDF version 1.5 (Acrobat 6.0) onwards. You can find more information about the compression algorithm on this page. Even though it is more efficient than JPEG compression, JPEG2000 isn’t used that much yet because of compatibility issues with older systems.

ZIP compression

The ZIP algorithm is also used in popular PC applications like PKzip, WinZIP or StuffIt. When you select ZIP compression, this does not mean that Acrobat will create a ZIPped file, it will just use the algorithm to compress grayscale or color images.

ZIP is a somewhat smarter version of LZW compression. It is a lossless algorithm. This means that the content of your images will not change by compressing them. If you are still using Acrobat 3, it offers options for so called 4-bit and 8-bit ZIP compression. 4-bit ZIP-compression means that Acrobat will first change the number of colours from 256 per channel to 16 per channel and then it will perform a lossless ZIP-compression. This leads to an excellent compression ratio but the quality of the images suffers enormously. Avoid 4-bit ZIP compression unless you are sure your document lends itself to it. 8-bit ZIP compression is completely lossless. From Acrobat 4 onwards, ZIP compression is always 8-bit.

CCITT compression

CCITT compression can be used for black-and-white images. It is the same compression algorithm that is also used in fax devices. It is lossless meaning it will not affect the quality of your images.

Acrobat offers CCITT group 3 or group 4 compression. Everyone seems to agree that CCITT group 4 is preferable. You can leave it switched ‘on’ all the time.

RLE compression

RLE stands for Run Length Encoding. It is a lossless algorithm so it will not change the quality of your images. More information about the algorithm can be found on this page.

In Acrobat RLE compression can be used for black-and-white images. Most people seem to prefer CCITT compression to RLE because it is more efficient.

Flate compression

Flate (or deflate as it is sometimes called) is a rather complex compression algorithm. Read this page to learn more about it.

JBIG2 compression

JBIG2 is an alternative to CCITT compression for black & white images. It is supported from PDF 1.5 (Acrobat 6) onwards. I haven’t seen it used yet, which could be because of some users feedback that the implementation in Acrobat is much slower than CCITT G4 compression.

Add a Comment