PDF versus other file formats

There are a number of other file formats that achieve similar things as PDF. Below I have listed some of them and try to explain the differences or similarities of these alternatives. These are the comparisons that are made:

PDF versus XPS

XPS is as yet the most serious alternative to PDF that has come to market. You can find more information about XPS here. That page also contains a comparison between both file formats.

PDF versus PostScript

PDF was developed by Adobe, the company that also created PostScript. In fact, PDF is based on PostScript. It uses the instruction set of PostScript itself but in a different way: while PostScript is really a programming language you could even use to write a chess program or word processor, PDF is more limited in its goal. It describes the layout of a document (using PostScript operators). As such, PDF resembles a database, rather than a programming language.

The main advantages of PDF over PostScript are:

  • PDF files tend to be smaller because of the more efficient compression algorithms that can be used. Algorithms such as JBIG2 and Jpeg2000 are not available in PostScript.
  • PDF files can easily be visualized using Adobe Reader, Adobe Acrobat, or other tools.
  • PDF files are easier to modify provided you have the proper tools.
  • PDF supports transparency and ICC-based color management.
  • PDF files are more device-independent. PostScript files are often created for a specific device and will generate PostScript errors if sent to another device.
  • PDF files can be more versatile than PostScript files: they can contain links to other data as well as interactive elements (multimedia, forms, 3D,.. ).

PDF versus HTML

PDF is often compared to HTML, the data format used to create web pages. Originally HTML was geared towards describing the structure of a document, rather than its appearance. The appearance of a web page was determined by the browser, not by the creator do the document. With the increasing popularity of the World Wide Web, newer versions of HTML focused more on the visual aspect of webpages, rather than their content. So in a way, HTML moved towards the goals that PDF tries to achieve.

At the same time, Adobe put more and more web-functionality in PDF. We got the option to add Internet links in PDF documents. An Adobe Reader plug-in for web browsers like Netscape Navigator or Internet Explorer became available and Acrobat 4 has an option to convert a website or part of a website to a PDF document. Adobe also provided a mechanism for byte streaming in PDF so you do not have to download an entire PDF file to see the first page of the document.

So PDF and HTML are becoming competitive standards. Right now PDF still is more powerful when it comes to describing the appearance of documents while HTML is better suited for low-speed Internet access. But it is perfectly feasible to use PDF on websites and to use HTML for a CD-ROM-based electronic catalog.

PDF versus XML

XML, the eXtensible Markup Language, is a data format that can be used to describe the content of documents (similar to SGML). It has received a lot of attention recently, mainly because its flexibility allows for easy integration with databases as well as Internet publishing and data exchange. XML does not really compete with PDF, it enhances it. While XML describes the content of a document, PDF describes its appearance. You cannot easily extract the content of a document from a PDF, at least not without a lot of manual work because the entire structure of a document gets lost during the creation of a PDF document.

Interestingly enough, PDF 1.3 introduced a mechanism (a structure tree) that can contain XML-alike data. So theoretically, it is possible to create a PDF document that contains both a structured overview of the content of a document as well as an exact presentation of its layout. Unfortunately, software (e.g. an XPress plug-in) to embed the XML-data in a PDF file (using pdfmarks) is not available yet. Acrobat plug-ins to extract the data from a structure tree and export them to an XML-compliant file are also still in their infancy. If you need both XML and PDF, the only way around right now is to create two separate files from the layout application or from a database publishing system.

In 2006, Adobe Labs published the first specs for Mars, a way of presenting PDF’s in an XML file. Acrobat 8 included support for Mars but for some reason this new approach never took off.

PDF versus Acrobat

Some people seem to confuse PDF, the data format, with Acrobat, the software suite that Adobe sells to generate, visualize, and manipulate PDF documents. This confusion seems to stem from the fact that until Acrobat 8 every new version of the Acrobat brought along a new version of the PDF specifications.  Acrobat 3 introduced version 1.2 of the PDF specifications, with Acrobat 4 came PDF 1.3, and so on.

4 thoughts on “PDF versus other file formats

  1. EPUB is meant to be reformatted by the device for better display using your selected fonts, sizes, etc. Thus it’s not comparablet to XPS and PDF which are designed to produce documents that look as close to the original as possible.

Leave a Reply

Your email address will not be published. Required fields are marked *