LZW compression

LZW is named after Abraham Lempel, Jakob Ziv and Terry Welch, the scientists who developed this compression algorithm. It is a lossless ‘dictionary based’ compression algorithm. Dictionary based algorithms scan a file for sequences of data that occur more than once. These sequences are then stored in a dictionary and within the compressed file, references are put where-ever repetitive data occurred.

Lempel and Ziv published a series of papers describing various compression algorithms. Their first algorithm was published in 1977, hence its name: LZ77. This compression algorithm maintains its dictionary within the data themselves.

Suppose you want to compress the following string of text: the quick brown fox jumps over the lazy dog. The word ‘the’ occurs twice in the file so the data can be compressed like this: the quick brown fox jumps over << lazy dog. in which << is a pointer to the first 4 characters in the string.

In 1978, Lempel and Ziv published a second paper outlining a similar algorithm that is now referred to as LZ78. This algorithm maintains a separate dictionary.

Suppose you once again want to compress the following string of text: the quick brown fox jumps over the lazy dog. The word ‘the’ occurs twice in the file so this string is put in an index that is added to the compressed file and this entry is referred to as *. The data then look like this: * quick brown fox jumps over * lazy dog.

How LZW works

In 1984, Terry Welch was working on a compression algorithm for high-performance disk controllers. He developed a rather simple algorithm that was based on the LZ78 algorithm and that is now called LZW.

LZW compression replaces strings of characters with single codes. It does not do any analysis of the incoming text. Instead, it just adds every new string of characters it sees to a table of strings. Compression occurs when a single code is output instead of a string of characters.

The code that the LZW algorithm outputs can be of any arbitrary length, but it must have more bits in it than a single character. The first 256 codes (when using eight bit characters) are by default assigned to the standard character set. The remaining codes are assigned to strings as the algorithm proceeds. The sample program runs as shown with 12 bit codes. This means codes 0-255 refer to individual bytes, while codes 256-4095 refer to substrings.

Advantages and disadvantages

  • LZW compression works best for files containing lots of repetitive data. This is often the case with text and monochrome images. Files that are compressed but that do not contain any repetitive information at all can even grow bigger!
  • LZW compression is fast.
  • LZW is a fairly old compression technique. All recent computer systems have the horsepower to use more efficient algorithms.
  • Royalties have to be paid to use LZW compression algorithms within applications (see below).

Where is LZW compression used?

LZW compression can be used in a variety of file formats:

  • TIFF files
  • GIF files
  • PDF files – In recent applications LZW has been replaced by the more efficient Flate algorithm.

Remarks

Some versions of LZW compression are copyrighted. Since a couple of year, their owner (Unisys) demands royalties from any company using their algorithm. This has seriously hampered the popularity of LZW compression and in the long run we will probably see it being replaced by less costly (read: free) algorithms. As an end user, you don’t have to worry because only software manufacturers have to pay license fees. But in the end, you do pay for this since the licensing costs have to be covered by the price of software.

8 August 2013

11 Responses to “LZW compression”

  1. GNANI says:

    what is the difference between LZW compression ,RLE compression

  2. manpreet uppal says:

    very gud thnx ….

  3. Jhun says:

    Is it possible to use LZW for pdf files? If yes, what software of plug-ins that is capable of this. Thanks in advance.

    • Laurens says:

      LZW can be used in PDF files – I’ve modified the above page to reflect this.
      If you selected lossless compression for B&W or grayscale images in older versions of Acrobat Distiller or chose to compress text, Distiller would use LZW. I’ve been told by one of the Adobe guys that this has now changed and the more efficient Flate/Deflate algorithm is used. I don’t know if other applications still stick to using LZW.

  4. Jhun says:

    Thanks Laurens! But can you please tel me what particular version of Acrobat Distiller has this functionality?

  5. dhanpal singh says:

    all the things given to r so good

  6. amann says:

    can anyone please provide me the code for lzw in matlab.its uregent.

  7. jo says:

    ..trying to ‘save as’ a TIFF file to LZW compression in Photoshop CS4
    There are 20? options but LZW is not one of them

    any advice

    jo

  8. Ninad says:

    sir is it possible to combine one or two of the algorithms to create a new one?
    for example…combining RLE & LFZ

  9. sapna says:

    from where can we get the code of this technique???


Advertising