The use of metadata in PDF files

The term metadata literally means ‘data about data’. Metadata provide additional information about a certain file, such as its author, creation data, possible copyright restrictions or the application used to create the file.

PDF files can contain metadata. This page provides an overview of the advantages of using metadata, the available techniques and the ways in which you can add, edit and view PDF metadata. The content is geared towards the graphic arts industry but may be practical for other types of PDF usage as well.

How to view the metadata in a PDF file

To view metadata in a PDF document, open it with Acrobat or Acrobat Reader and select ‘Document Properties’ in the File menu.

Applications geared towards managing libraries of data can show metadata. Adobe Bridge for example allows you to browse through folders containing PDF files and check basic metadata such as the author, description and copyright of PDF files. Theoretically operating systems should also be able to do this but while an OS like Windows 7 is great at showing picture related metadata (such as the resolution, bit depths, keywords,..) or music related metadata (such as the artist, album and genre), it fails to do so for PDF files.

Professional content management systems can not just display metadata but also allow for extensive searches based on the keywords or description field.

How to add or edit metadata

Many content creation applications, such as Microsoft Word, Adobe InDesign or Adobe Photoshop, allow users to define metadata for its files. In InDesign for instance you can use the ‘File Info’ menu option to define metadata such as the the document title, its description, the author, keywords and copyright related information. Such information is embedded in PDF metadata fields when a layout is exported to PDF.

PDF editing tools, such as Adobe Acrobat Professional, allow you to add metadata or edit them. For very specific types of metadata, a plug-in might be available to facilitate data entry or provide users with clear guidelines and choices for entering data.

How to remove metadata

Metadata add value to a file but there may be circumstances where you want to remove them. This is sometimes a requirement for legal reasons or done because of security or privacy concerns.

  • If you have Adobe Reader select File > Properties which brings up the Document Properties window. This shows the most important metadata fields which you can delete by hand.
  • To remove metadata in individual files, you can also use the PDF Optimizer option in Adobe Acrobat. In Acrobat 9 Professional select Advanced > PDF Optimizer. In the window that pops up select the Discard User Data option to the left and enable the Discard document information and metadata checkbox to the right. If you need to clean dozens or hundreds of files, you can do so using the batch function of Acrobat Professional: select Advanced > Document Processing > Batch Processing. Click on New Sequence and name the new sequence (don’t worry about the Select sequence of commands box just click on the Output Options button at the bottom. In the Output options window activate the PDF Optimizer option and click on Settings, edit the Optimizer settings as desired and name the settings. When you are back at the Batch Sequences window run the sequence you just created, choose your files and let Acrobat do its thing.
  • If you have the Enfocus Pitstop plug-in for Acrobat, it includes an action for removing metadata. The Callas pdfAutoOptimizer tool has a similar function.
  • There are command line tools to batch clean PDF files as well as companies that offer this type of service for a fee. Google is your friend.

How metadata are stored in PDF files

There are several mechanisms available within PDF files to add metadata:

  • The Info Dictionary has been included in PDF since version 1.0. It contains a set of document info entries, simple pairs of data that consist of a key and a matching value. Some of these are predefined, such as Title, Author, Subject, Keywords, Created (the creation date), Modified (the latest modification date) and Application (the originating application or library). Applications can add their own sets of data to the info dictionary.
  • XMP (Extensible Metadata Platform) is an Adobe technology for embedding metadata into files. It can be used with a wide variety of data files. With Acrobat 5 and PDF 1.4 (2001)  this mechanism was also made available for PDF files. XMP is more powerful than the info dictionary, which is why it is used in a number of PDF-based metadata standards.
  • Additional ways of embedding metadata are the PieceInfo Dictionary (used by Illustrator and Photoshop for application specific data when you save a file as a PDF), Object Data (or User Properties) and Measurement Properties.

PDF metadata standards

There are a number of interesting standards for enriching PDF files with metadata. Below is a short summary:

  • There are PDF substandards such as PDF/X and PDF/A that require the use of specific metadata. In a PDF/X-1a file, for example, there has to be a metadata field that describes whether the PDF file has been trapped or not.
  • The GWG ad ticket provides a standardized way to include advertisement metadata into a PDF file.
  • Certified PDF is a proprietary mechanism for embedding metadata about preflighting – whether a PDF file intended to be printed by a commercial printer or newspaper has been properly checked on the presence of all fonts, images with a sufficient resolution,…

The filename is metadata as well

The easiest way to add information about a PDF to the file is by giving it a proper filename. A name like ‘SmartGuide_12_p057-096_v3.pdf’ tells a recipient much more about what the file is about than ‘pages_part2_nextupdate.pdf’ does.

  • Add the name of the publication and possibly the edition to the filename.
  • Add a revision number (e.g. ‘v3′) if there will be multiple updates of a file.
  • If a file contains part of the pages of a publication add at least the initial folio to the filename. That allows people to easily sort files in the right order. Use 2 or 3 digits for the page number (e.g. ’009′ instead of just ’9′).
  • Do not use characters that are not supported in other operating systems or that have a special meaning in some applications: * < > [ ] = + ” \ / , . : ; ? % # $ | & •. Do not use a space as the first or last character of the filename.
  • Don’t make the filename too long. Once you go beyond 50 characters or so people may not notice the full information or the filename may get clipped in browser windows or applications.
  • Many prepress workflow systems can automatically insert files into a job based on a specific naming convention. This speeds up the processing of the job and can avoid costly mistakes. Consult with your printer – they may have guidelines for submitting files.

Other sources of information

The B4print forum has a pretty good thread about removing metadata from which I picked up some useful information.

9 August 2013

10 Responses to “The use of metadata in PDF files”

  1. James Terry V says:

    Are there any Adobe alternatives that can be used to edit/add/remove metadata from .PDF documents?

  2. Cliff Bennett says:

    I deal with other people’s files as much as my own. If everybody doesn’t use metadata in a standardized way, it quickly loses usefulness for me.

    But maybe I’ll have to re-examine the subject and see if I can use it to help me manage files prepared by others.

  3. PM Redulla, Jr. says:

    @James:

    You can try BeCyPDFMetaEdit, which is free.

    http://www.becyhome.de/becypdfmetaedit/description_eng.htm

  4. j walker says:

    We are scanning ~300 manuals to .pdf. Will include keywords to assist in locating a specific manual. i understand how to use the Acrobat search tool to find documents via keywords; is it possible to use the Windows search function for keywords in pdf document properties?
    thnx

    jsw

    • Laura says:

      Hi, J Walker:

      I have the same question. Did you figure out how to do add the keywords of a pdf be searchable in Windows’ search engine?

      Thanks,
      Laura

  5. Joe says:

    Dose anyone know how to preserve metadata when combining multiple photo’s into 1 PDF file using Adobe Master sweet CS6?

  6. Ricardo Soares says:

    Hi,
    look for Exif Tool. Not very intuitive but very supported by the owner..

    http://www.sno.phy.queensu.ca/~phil/exiftool/

  7. Thanks for your info regarding Metadata on PDF, I created website about business letter to convert from .doc to .pdf but why the MetaData still show if the google index direct to the pdf file even though the properties on doc was deleted?

    Regards
    PH

  8. Himakara says:

    Hi,

    I am new to indesign scripting. I want to know how to change the metadata keyword from “ABC” to “XYZ” using script

    Thanks

  9. Abbas says:

    Hi friends
    I am a student doing my literature review. As I am downloading many pdf files, I need a software to manage the pdf files by adding my desired fields like author/reading status/date read/year/summary/comments,… easily and viewing/editing all the fields of many files very easily. The metadata tool in the “properties” menu of adobe acrobat is not friendly at all and I need to add my required fields for any file. Do you have any solution. Common softwares are also suggesting limited fields for metadata.

    Regards
    Abbas


Advertising