PDF versus other file formats

There are a number of other file formats that achieve similar things as PDF. Below I have listed some of them and try to explain the differences or similarities of these alternatives. These are the topics that are covered:

PDF versus XPS

XPS is as yet the most serious alternative to PDF that has come to market. You can find more information about XPS here. That page also contains a comparison between both file formats.

PDF versus PostScript

PDF was developed by Adobe, the company that also created PostScript. In fact, PDF is based on PostScript. It uses the instruction set of PostScript itself but in a different way: while PostScript is really a programming language you could even use to write a chess program or word processor, PDF is more limited in its goal. It just describes the layout of a document (using PostScript operators). As such, PDF resembles a data base, rather than a programming language.

The main advantages of PDF over PostScript are:

  • PDF files tend to be smaller because of the efficient compression algorithms that can be used.
  • PDF files can easily be visualized using Acrobat or other tools.
  • PDF files are easier to modify provided you have the proper tools.
  • PDF files are more device independent. PostScript files are often created for a specific device and will generate PostScript errors if sent to another device.
  • PDF files can be more versatile than PostScript files: they can contain links to other data as well as multimedia elements.

PDF versus HTML

PDF is often compared to HTML, the data format used to create web pages. Originally HTML was geared towards describing the structure of a document, rather than its appearance. The appearance of a web page was determined by the browser, not by the creator do the document. With the increasing popularity of the World Wide Web, newer versions of HTML focused more on the visual aspect of web pages, rather than their content. So in a way, HTML moved towards the goals that PDF tries to achieve.

At the same time, Adobe put more and more web-functionality in PDF. We got the option to add Internet links in PDF document. An Acrobat Reader plug-on for web browsers like Netscape Navigator or Internet Explorer became available and Acrobat 4 has an option to convert a web site or part of a web site to a PDF document. Adobe also provided a mechanism for byte streaming in PDF so you do not have to download an entire PDF file to see the first page of the document.

So PDF and HTML are becoming competitive standards. Right now PDF still is more powerful when it comes to describing the appearance of documents while HTML is better suited for low speed Internet access. But it is perfectly feasible to use PDF on web sites and to use HTML for a CD-ROM based electronic catalogue.

PDF versus XML

XML, the eXtensible Markup Language, is a data format that can be used to describe the content of documents (similar to SGML). It has received a lot of attention recently, mainly because its flexibility allows for easy integration with databases as well as Internet publishing and data exchange. XML does not really compete with PDF, it enhances it. While XML describes the content of a document, PDF describes its appearance. You cannot easily extract the content of a document from a PDF, at least not without a lot of manual work because the entire structure of a document gets lost during the creation of a PDF document.

Interestingly enough, PDF 1.3 introduced a mechanism (a structure tree) that can contain XML-alike data. So theoretically, it is possible to create a PDF document that contains both a structured overview of the content of a document as well as an exact presentation of its layout. Unfortunately software (e.g. an XPress plug-in) to embed the XML-data in a PDF file (using pdfmarks) is not available yet. Acrobat plug-ins to extract the data from a structure tree and export them to an XML-compliant file are also still in their infancy. If you need both XML and PDF, the only way around right now is to create two separate files from the layout application or from a database publishing system.

In 2006, Adobe Labs published the first specs for Mars, a way of presenting PDF’s in an XML file. Acrobat 8 includes support for Mars but currently the technology isn’t used yet.

PDF versus Acrobat

Many people seem to confuse PDF, the data format, with Acrobat, the software suite that Adobe sells to generate, visualize and manipulate PDF documents. This confusion seems to stem from the fact that every new version of the PDF specifications brings along a new version of Acrobat. Version 1.2 of the PDF specifications was accompanied by Acrobat 3. PDF 1.3 first materialized in Acrobat 4, and so on.

Add a Comment