PDF files can be fairly compact, much smaller than the equivalent PostScript files. This is achieved through a better data structure but it is mainly due to the very efficient compression algorithms that PDF supports. The list of compression algorithms that can be used is extensive:
- CCITT G3/G4 – used for monochrome images
- JPEG – a lossy algorithm that is used for images
- JPEG2000 – a more modern alternative to JPEG, which is also used for compressing images
- Flate – used for compressing text as well as images
- JBIG2 – an alternative to CCITT compression for monochrome images
- LZW – used for compressing text as well as images but getting replaced by Flate
- RLE – used for monochrome images
- ZIP – used for grayscale or color images
The use of these compression algorithms is discussed in more detail in the bottom section of this page.
How to compress PDF files
If you save publications to PDF in applications like Adobe InDesign or Photoshop, the Save menu provides options to determine which data get compressed and how this is done. If compression is switched on in Acrobat Distiller, Distiller will first decompress all images in a PostScript file and then recompress them while creating a PDF file.
How to check the compression that was used in a PDF
In some cases, you can open the PDF using a text editor that can handle binary data (TextPad, UltraEdit,…) and search for the “/Filter” keywords.
The use of compression algorithms in PDF files
CCITT compression can be used for black-and-white images. It is the same compression algorithm that is also used in fax devices. It is lossless meaning it will not affect the quality of your images.
Acrobat offers CCITT group 3 or group 4 compression. Everyone seems to agree that CCITT group 4 is preferable. You can leave it switched ‘on’ all the time.
Flate (or deflate as it is sometimes called) is a rather complex compression algorithm. Read this page to learn more about it.
JBIG2 is an alternative to CCITT compression for monochrome black & white images. It is supported from PDF 1.5 (Acrobat 6) onwards. I haven’t seen it used yet, which could be because of some user’s feedback that the implementation in Acrobat is much slower than CCITT G4 compression.
JPEG compression is used for color and grayscale images. It is a compression algorithm that can be both lossy and non-lossy. In Acrobat, only lossy JPEG-compression is available. This means that some of the detail of the image is lost by compressing it. The better the compression ratio, the more detail you lose.
More information on the JPEG compression algorithm can be found on this page.
From Acrobat Distiller 4 onwards, there are 5 different levels of compression:
- Minimum, with a quality loss that will be acceptable for everything but the most demanding jobs. Average compression ratio: 1/2
- Medium, acceptable for low-quality work. Average compression ratio: 1/5
- Maximum: not acceptable for prepress use. Average compression ratio: 1/10
Don’t compress images that have already been compressed using JPEG, this will lead to additional loss of information. If you distill a file that contains JPEG compressed images, Distiller will uncompress these and then recompress them according to your settings. This will lead to an additional loss in image quality.
Once a PDF is created, you can still alter the compression ratio. This can be handy when a file is too big for emailing or uploading it. Remember the disadvantage: when you recompress the data, there is an additional loss of detail! If possible, redistill the original source file.
- In Acrobat Professional 7 & later, there is a menu option called PDF Optimizer which allows you to recompress all data in the PDF.
- There are a number of Acrobat plug-ins that can recompress data. I particularly like Quite-a-box-of-tricks from Quite software but there are others available.
This fairly new compression algorithm is supported from PDF version 1.5 (Acrobat 6.0) onwards. You can find more information about the compression algorithm on this page. Even though it is more efficient than JPEG compression, JPEG2000 isn’t used that much yet because of its CPU overhead and compatibility issues with older systems.
All text and the operators that are part of a PDF can be compressed using an LZW algorithm. This basic compression can reduce the file size of a PDF to about half the size of an equivalent PostScript file.
RLE stands for Run Length Encoding. It is a lossless algorithm so it will not change the quality of your images. More information about the algorithm can be found on this page.
In Acrobat RLE compression can be used for black-and-white images. Most people seem to prefer CCITT compression to RLE because it is more efficient.
The ZIP algorithm is also used in popular PC applications like PKzip, WinZIP or StuffIt. When you select ZIP compression, this does not mean that Acrobat will create a ZIPped file, it will just use the algorithm to compress grayscale or color images.
ZIP is a somewhat smarter version of LZW compression. It is a lossless algorithm. This means that the content of your images will not change by compressing them. If you are still using Acrobat 3, it offers options for so-called 4-bit and 8-bit ZIP compression. 4-bit ZIP-compression means that Acrobat will first change the number of colors from 256 per channel to 16 per channel and then it will perform a lossless ZIP-compression. This leads to an excellent compression ratio but the quality of the images suffers enormously. Avoid 4-bit ZIP compression unless you are sure your document lends itself to it. 8-bit ZIP compression is completely lossless. From Acrobat 4 onwards, ZIP compression is always 8-bit.
6 thoughts on “Compression in PDF files”
Some point I’d add:
– Very few PDF generation programs – maybe none – use RLE. Because it kind of sucks.
– JBIG2 in lossless mode is pretty common PDF, as it’ll outperform CCITT. It does have a lossy mode, which is almost never used because it has a known issue. It has an awkward way of introducing compression artifacts that substitute symbols – and as no-one wants their order for 8.800 liters of paint to turn into an order for 8,000 liters, it’s best avoided.
– While LZW is common in older PDFs, more recent software has been abandoning it in favor of DEFLATE.
– Bitmap images are commonly compressed with pre-filtering using a predictor. With the right predictor, this closely resembles the algorithm used by PNG. Unfortunately not compatible, but almost equal in performance.
– JPG2000 is common in PDF now – about the only place JPG2000 is common – but still not so common as JPG.
I did some informal study of this: https://birds-are-nice.me/publications/Inside%20PDFs.txt
There are a lot of programs that promise to compress PDF files and make them smaller, but of all the ones I tried, the only one I would personally endorse is… actually mine. I wrote it. Shameless enough, but https://birds-are-nice.me/software/minuimus.html
It’s lossless, and pretty capable – but linux only, never ported it to windows.
thanks for the information
Thanks for this article. I just want to know if we have lossless algorithm to compress PDF files?
ZIP is a somewhat smarter version of LZW compression. It is a lossless algorithm.
You should really take a look at the JBIG2 format. It is amazing.
Excellent article! For anyone looking for PDF compression software, I highly recommend CVISION tech’s product PdfCompressor