The term metadata literally means ‘data about data’. Metadata provide additional information about a certain file, such as its author, creation data, possible copyright restrictions or the application used to create the file.
PDF files can contain metadata. This page provides an overview of the advantages of using metadata, the available techniques and the ways in which you can add, edit and view PDF metadata. The content is geared towards the graphic arts industry but may be practical for other types of PDF usage as well.
How to view the metadata in a PDF file
To view metadata in a PDF document, open it with Acrobat or Acrobat Reader and select ‘Document Properties’ in the File menu.
Applications geared towards managing libraries of data can show metadata. Adobe Bridge, for example, allows you to browse through folders containing PDF files and check basic metadata such as the author, description, and copyright of PDF files. Theoretically operating systems should also be able to do this but while an OS like Windows 7 is great at showing picture related metadata (such as the resolution, bit depths, keywords,..) or music related metadata (such as the artist, album, and genre), it fails to do so for PDF files.
Professional content management systems can not just display metadata but also allow for extensive searches based on the keywords or description field.
How to add or edit metadata
Many content creation applications, such as Microsoft Word, Adobe InDesign or Adobe Photoshop, allow users to define metadata for its files. In InDesign, for instance, you can use the ‘File Info’ menu option to define metadata such as the document title, its description, the author, keywords and copyright related information. Such information is embedded in PDF metadata fields when a layout is exported to PDF.
PDF editing tools, such as Adobe Acrobat Professional, allow you to add metadata or edit them. For very specific types of metadata, a plug-in might be available to facilitate data entry or provide users with clear guidelines and choices for entering data.
How to remove metadata
Metadata add value to a file but there may be circumstances where you want to remove them. This is sometimes a requirement for legal reasons or done because of security or privacy concerns.
- If you have Adobe Reader select File > Properties which brings up the Document Properties window. This shows the most important metadata fields which you can delete by hand.
- To remove metadata in individual files, you can also use the PDF Optimizer option in Adobe Acrobat. In Acrobat 9 Professional select Advanced > PDF Optimizer. In the window that pops up select the Discard User Data option to the left and enable the Discard document information and metadata checkbox to the right. If you need to clean dozens or hundreds of files, you can do so using the batch function of Acrobat Professional: select Advanced > Document Processing > Batch Processing. Click on New Sequence and name the new sequence (don’t worry about the Select sequence of commands box just click on the Output Options button at the bottom. In the Output options window activate the PDF Optimizer option and click on Settings, edit the Optimizer settings as desired and name the settings. When you are back at the Batch Sequences window run the sequence you just created, choose your files and let Acrobat do its thing.
- If you have the Enfocus Pitstop plug-in for Acrobat, it includes an action for removing metadata. The Callas pdfAutoOptimizer tool has a similar function.
- There are command line tools to batch clean PDF files as well as companies that offer this type of service for a fee. Google is your friend.
How metadata are stored in PDF files
There are several mechanisms available within PDF files to add metadata:
- The Info Dictionary has been included in PDF since version 1.0. It contains a set of document info entries, simple pairs of data that consist of a key and a matching value. Some of these are predefined, such as Title, Author, Subject, Keywords, Created (the creation date), Modified (the latest modification date) and Application (the originating application or library). Applications can add their own sets of data to the info dictionary.
- XMP (Extensible Metadata Platform) is an Adobe technology for embedding metadata into files. It can be used with a wide variety of data files. With Acrobat 5 and PDF 1.4 (2001) this mechanism was also made available for PDF files. XMP is more powerful than the info dictionary, which is why it is used in a number of PDF-based metadata standards.
- Additional ways of embedding metadata are the PieceInfo Dictionary (used by Illustrator and Photoshop for application specific data when you save a file as a PDF), Object Data (or User Properties) and Measurement Properties.
PDF metadata standards
There are a number of interesting standards for enriching PDF files with metadata. Below is a short summary:
- There are PDF substandards such as PDF/X and PDF/A that require the use of specific metadata. In a PDF/X-1a file, for example, there has to be a metadata field that describes whether the PDF file has been trapped or not.
- The GWG ad ticket provides a standardized way to include advertisement metadata into a PDF file.
- Certified PDF is a proprietary mechanism for embedding metadata about preflighting – whether a PDF file intended to be printed by a commercial printer or newspaper has been properly checked for the presence of all fonts, images with a sufficient resolution,…
- The GWG processing steps specification is fairly new and meant to standardize the way production information for the printing industry can be embedded in PDF files. This is done using both additional objects and metadata. By standardizing the way information about die cutting, embossing, varnishing, etc is included a PDF, it will become easier for brands, design agencies, converters and printers to collaborate and automate production.
The filename is metadata as well
The easiest way to add information about a PDF to the file is by giving it a proper filename. A name like ‘SmartGuide_12_p057-096_v3.pdf’ tells a recipient much more about what the file is about than ‘pages_part2_nextupdate.pdf’ does.
- Add the name of the publication and possibly the edition to the filename.
- Add a revision number (e.g. ‘v3’) if there will be multiple updates of a file.
- If a file contains part of the pages of a publication add at least the initial folio to the filename. That allows people to easily sort files in the right order. Use 2 or 3 digits for the page number (e.g. ‘009’ instead of just ‘9’).
- Do not use characters that are not supported in other operating systems or that have a special meaning in some applications: * < > [ ] = + ” \ / , . : ; ? % # $ | & •. Do not use a space as the first or last character of the filename.
- Don’t make the filename too long. Once you go beyond 50 characters or so people may not notice the full information or the filename may get clipped in browser windows or applications.
- Many prepress workflow systems can automatically insert files into a job based on a specific naming convention. This speeds up the processing of the job and can avoid costly mistakes. Consult with your printer – they may have guidelines for submitting files.
Other sources of information
The B4print forum has a pretty good thread about removing metadata from which I picked up some useful information.