PDF files can be fairly compact, much smaller than the equivalent PostScript files. This is achieved through a better data structure but it is mainly due to the very efficient compression algorithms that PDF supports. The list of compression algorithms that can be used is extensive:
- CCITT G3/G4 – used for monochrome images
- JPEG – a lossy algorithm that is used for images
- JPEG2000 – a more modern alternative to JPEG, which is also used for compressing images
- Flate – used for compressing text as well as images
- JBIG2 – an alternative to CCITT compression for monochrome images
- LZW – used for compressing text as well as images but getting replaced by Flate
- RLE – used for monochrome images
- ZIP – used for grayscale or color images
The use of these compression algorithms is discussed in more detail in the bottom section of this page.
How to compress PDF files
If you save publications to PDF in applications like Adobe InDesign or Photoshop, the Save menu provides options to determine which data get compressed and how this is done. If compression is switched on in Acrobat Distiller, Distiller will first decompress all images in a PostScript file and then recompress them while creating a PDF file.
How to check the compression that was used in a PDF
In some cases, you can open the PDF using a text editor which can handle binary data (TextPad, UltraEdit,…) and search for the “/Filter” keywords.
The use of compression algorithms in PDF files
CCITT compression can be used for black-and-white images. It is the same compression algorithm that is also used in fax devices. It is lossless meaning it will not affect the quality of your images.
Acrobat offers CCITT group 3 or group 4 compression. Everyone seems to agree that CCITT group 4 is preferable. You can leave it switched ‘on’ all the time.
Flate (or deflate as it is sometimes called) is a rather complex compression algorithm. Read this page to learn more about it.
JBIG2 is an alternative to CCITT compression for monochrome black & white images. It is supported from PDF 1.5 (Acrobat 6) onwards. I haven’t seen it used yet, which could be because of some users feedback that the implementation in Acrobat is much slower than CCITT G4 compression.
JPEG compression is used for color and grayscale images. It is a compression algorithm that can be both lossy and non-lossy. In Acrobat, only lossy JPEG-compression is available. This means that some of the detail of the image is lost by compressing it. The better the compression ratio, the more detail you lose.
More information on the JPEG compression algorithm can be found on this page.
From Acrobat Distiller 4 onwards, there are 5 different levels of compression:
- Minimum, with a quality loss that will be acceptable for everything but the most demanding jobs. Average compression ratio: 1/2
- Medium, acceptable for low-quality work. Average compression ratio: 1/5
- Maximum: not acceptable for prepress use. Average compression ratio: 1/10
Don’t compress images that have already been compressed using JPEG, this will lead to additional loss of information. If you distil a file that contains JPEG compressed images, Distiller will uncompress these and then recompress them according to your settings. This will lead to an additional loss in image quality.
Once a PDF is created, you can still alter the compression ratio. This can be handy when a file is too big for e-mailing or uploading it. Remember the disadvantage: when you recompress the data, there is an additional loss of detail! If possible, redistill the original source file.
- In Acrobat Professional 7 & later, there is a menu option called PDF Optimizer which allows you to recompress all data in the PDF.
- There are a number of Acrobat plug-ins that can recompress data. I particularly like Quite-a-box-of-tricks from Quite software but there are others available.
This fairly new compression algorithm is supported from PDF version 1.5 (Acrobat 6.0) onwards. You can find more information about the compression algorithm on this page. Even though it is more efficient than JPEG compression, JPEG2000 isn’t used that much yet because of its CPU overhead and compatibility issues with older systems.
All text and the operators that are part of a PDF can be compressed using an LZW algorithm. This basic compression can reduce the file size of a PDF to about half the size of an equivalent PostScript file.
RLE stands for Run Length Encoding. It is a lossless algorithm so it will not change the quality of your images. More information about the algorithm can be found on this page.
In Acrobat RLE compression can be used for black-and-white images. Most people seem to prefer CCITT compression to RLE because it is more efficient.
The ZIP algorithm is also used in popular PC applications like PKzip, WinZIP or StuffIt. When you select ZIP compression, this does not mean that Acrobat will create a ZIPped file, it will just use the algorithm to compress grayscale or color images.
ZIP is a somewhat smarter version of LZW compression. It is a lossless algorithm. This means that the content of your images will not change by compressing them. If you are still using Acrobat 3, it offers options for so-called 4-bit and 8-bit ZIP compression. 4-bit ZIP-compression means that Acrobat will first change the number of colors from 256 per channel to 16 per channel and then it will perform a lossless ZIP-compression. This leads to an excellent compression ratio but the quality of the images suffers enormously. Avoid 4-bit ZIP compression unless you are sure your document lends itself to it. 8-bit ZIP compression is completely lossless. From Acrobat 4 onwards, ZIP compression is always 8-bit.