PDF versus other file formats

There are a number of other file formats that achieve similar things as PDF. Below I have listed some of them and try to explain the differences or similarities of these alternatives. These are the comparisons that are made:

PDF versus XPS

XPS is as yet the most serious alternative to PDF that has come to market. You can find more information about XPS here. That page also contains a comparison between both file formats.

PDF versus PostScript

PDF was developed by Adobe, the company that also created PostScript. In fact, PDF is based on PostScript. It uses the instruction set of PostScript itself but in a different way: while PostScript is really a programming language you could even use to write a chess program or word processor, PDF is more limited in its goal. It describes the layout of a document (using PostScript operators). As such, PDF resembles a database, rather than a programming language.

The main advantages of PDF over PostScript are:

  • PDF files tend to be smaller because of the more efficient compression algorithms that can be used. Algorithms such as JBIG2 and Jpeg2000 are not available in PostScript.
  • PDF files can easily be visualized using Adobe Reader, Adobe Acrobat, or other tools.
  • PDF files are easier to modify provided you have the proper tools.
  • PDF supports transparency and ICC-based color management.
  • PDF files are more device-independent. PostScript files are often created for a specific device and will generate PostScript errors if sent to another device.
  • PDF files can be more versatile than PostScript files: they can contain links to other data as well as interactive elements (multimedia, forms, 3D,.. ).

PDF versus HTML

PDF is often compared to HTML, the data format used to create web pages. Originally HTML was geared towards describing the structure of a document, rather than its appearance. The appearance of a web page was determined by the browser, not by the creator do the document. With the increasing popularity of the World Wide Web, newer versions of HTML focused more on the visual aspect of webpages, rather than their content. So in a way, HTML moved towards the goals that PDF tries to achieve.

At the same time, Adobe put more and more web-functionality in PDF. We got the option to add Internet links in PDF documents. An Adobe Reader plug-in for web browsers like Netscape Navigator or Internet Explorer became available and Acrobat 4 has an option to convert a website or part of a website to a PDF document. Adobe also provided a mechanism for byte streaming in PDF so you do not have to download an entire PDF file to see the first page of the document.

So PDF and HTML are becoming competitive standards. Right now PDF still is more powerful when it comes to describing the appearance of documents while HTML is better suited for low-speed Internet access. But it is perfectly feasible to use PDF on websites and to use HTML for a CD-ROM-based electronic catalog.

PDF versus XML

XML, the eXtensible Markup Language, is a data format that can be used to describe the content of documents (similar to SGML). It has received a lot of attention recently, mainly because its flexibility allows for easy integration with databases as well as Internet publishing and data exchange. XML does not really compete with PDF, it enhances it. While XML describes the content of a document, PDF describes its appearance. You cannot easily extract the content of a document from a PDF, at least not without a lot of manual work because the entire structure of a document gets lost during the creation of a PDF document.

Interestingly enough, PDF 1.3 introduced a mechanism (a structure tree) that can contain XML-alike data. So theoretically, it is possible to create a PDF document that contains both a structured overview of the content of a document as well as an exact presentation of its layout. Unfortunately, software (e.g. an XPress plug-in) to embed the XML-data in a PDF file (using pdfmarks) is not available yet. Acrobat plug-ins to extract the data from a structure tree and export them to an XML-compliant file are also still in their infancy. If you need both XML and PDF, the only way around right now is to create two separate files from the layout application or from a database publishing system.

In 2006, Adobe Labs published the first specs for Mars, a way of presenting PDF’s in an XML file. Acrobat 8 included support for Mars but for some reason this new approach never took off.

PDF versus Acrobat

Some people seem to confuse PDF, the data format, with Acrobat, the software suite that Adobe sells to generate, visualize, and manipulate PDF documents. This confusion seems to stem from the fact that until Acrobat 8 every new version of the Acrobat brought along a new version of the PDF specifications.  Acrobat 3 introduced version 1.2 of the PDF specifications, with Acrobat 4 came PDF 1.3, and so on.

5 thoughts on “PDF versus other file formats

  1. DjVu is a great document format, especially for archiving. Three prominent U.S. magazines chose DjVu over PDF to archive their decades of pages: The Rolling Stone Magazine, Playboy, and The New Yorker Magazine. This was because DjVu compressed the data much better than PDF did, while retaiining a fine appearance, and so the magazines could offer their archives on several DVDs of DjVus, instead of maybe 30 or 40 DVDs of PDFs.

    The DjVu format was initiated at AT & T.

    There are several reasons why DjVu has not been as popular recently. First, PDF has been improving its features and its compression. Second, people like a format that is regularly used by lots of other people. Third, there really is no great free GUI program that makes full use of all the features that DjVu offers. Fourth, many documents are only a few pages long, so that the difference in performance between PDF and DjVu is not noticed that much. Fifth, many people have not even heard of DjVu.

    The company FoxIt Technologies, which works with PDFs, purchases the rights to use DjVu-based compression technology to make high-compression PDFs that are about as efficient at compression as DjVu encoders.

    DjVu can compress an image about twice as efficently as a standard JPG can. DjVu has become a little-known gem. There is a newish program called minidjvu-mod that has been tweaked to make it more efficient than traditional DjVu programs. But with little market share, DjVu is mainly a beautiful and highly functional curiosity at this point, but it is still a fine format, and works across Windows, Mac, and Linux.

    The free program DjVuLibre can be used to produce some fine DjVu documents, with lots of encoding options available. If you have the patience to create the necessary optical character recognition file, you could even make DjVu find an image of an oak tree when you search for the word “oak,” even if the image does not have a title or caption.

    If you are a nerdy type, DjVu is also fun to tinker with behind the scenes.

  2. EPUB is meant to be reformatted by the device for better display using your selected fonts, sizes, etc. Thus it’s not comparablet to XPS and PDF which are designed to produce documents that look as close to the original as possible.

Leave a Reply

Your email address will not be published. Required fields are marked *