LZW is named after Abraham Lempel, Jakob Ziv and Terry Welch, the scientists who developed this compression algorithm. It is a lossless ‘dictionary based’ compression algorithm. Dictionary based algorithms scan a file for sequences of data that occur more than once. These sequences are then stored in a dictionary and within the compressed file, references are put where-ever repetitive data occurred.
Lempel and Ziv published a series of papers describing various compression algorithms. Their first algorithm was published in 1977, hence its name: LZ77. This compression algorithm maintains its dictionary within the data themselves.
Suppose you want to compress the following string of text: the quick brown fox jumps over the lazy dog. The word ‘the ‘ occurs twice in the file so the data can be compressed like this: the quick brown fox jumps over <<lazy dog. in which << is a pointer to the first 4 characters in the string.
In 1978, Lempel and Ziv published a second paper outlining a similar algorithm that is now referred to as LZ78. This algorithm maintains a separate dictionary.
Suppose you once again want to compress the following string of text: the quick brown fox jumps over the lazy dog. The sequence ‘the ‘ occurs twice so this string is put in an index that is added to the compressed file and this entry is referred to as *. The data then look like this: *quick brown fox jumps over *lazy dog.
How LZW works
In 1984, Terry Welch was working on a compression algorithm for high-performance disk controllers. He developed a rather simple algorithm that was based on the LZ78 algorithm and that is now called LZW.
LZW compression replaces strings of characters with single codes. It does not do any analysis of the incoming text. Instead, it just adds every new string of characters it sees to a table of strings. Compression occurs when a single code is output instead of a string of characters.
The code that the LZW algorithm outputs can be of any arbitrary length, but it must have more bits in it than a single character. The first 256 codes (when using eight-bit characters) are by default assigned to the standard character set. The remaining codes are assigned to strings as the algorithm proceeds. The sample program runs as shown with twelve-bit codes. This means codes 0-255 refer to individual bytes, while codes 256-4095 refer to substrings.
Advantages and disadvantages
- LZW compression works best for files containing lots of repetitive data. This is often the case with text and monochrome images. Files that are compressed but that do not contain any repetitive information at all can even grow bigger!
- LZW compression is fast.
- LZW is a fairly old compression technique. All recent computer systems have the horsepower to use more efficient algorithms.
- Royalties have to be paid to use LZW compression algorithms within applications (see below).
Where is LZW compression used?
LZW compression can be used in a variety of file formats:
- TIFF files
- GIF files
- PDF files – In recent applications, LZW has been replaced by the more efficient Flate algorithm.
Some versions of LZW compression are copyrighted. Their owner, Unisys, demands royalties from any company using their algorithm. This has seriously hampered the popularity of LZW compression and in the long run, we will probably see it being replaced by less costly (read: free) algorithms. As an end-user, you don’t have to worry because only software manufacturers have to pay license fees. But in the end, you do pay for this since the licensing costs have to be covered by the price of the software.