The PNG Guide is an eBook based on Greg Roelofs' book, originally published by O'Reilly.



The Deflate Compression Algorithm

In some ways compression is responsible for the very existence of the Portable Network Graphics format (recall Chapter 1, "An Introduction to PNG"), and it is undoubtedly one of the most important components of PNG. The PNG specification defines a single compression method, the deflate algorithm, for all image types.

Part of the LZ77 class of compression algorithms, deflate was defined by PKWARE in 1991 as part of the 1.93a beta version of their PKZIP archiver. Independently implemented by Jean-loup Gailly and Mark Adler, first for Info-ZIP's Zip and UnZip utilities and shortly thereafter for the GNU gzip utility, the deflate algorithm is battle-tested and today is probably the most commonly used file-compression algorithm on the Internet. Although it is not the best-compressing algorithm known,[70] deflate has a very desirable mix of characteristics: high reliability, good compression, good encoding speed, excellent decoding speed, minimal overhead on incompressible data, and modest, well-defined memory footprints for both encoding and decoding.

[70] Arithmetic coding has been around for a long time and significantly outperforms deflate; the relatively recently published Burrows-Wheeler block transform coding (implemented in bzip2, for example) shows considerable promise as well; and the patented BitJazz condensation method is likewise quite impressive.

As an LZ77-derived algorithm, deflate is fundamentally based on the concept of a sliding window. One begins with the premise that many types of interesting data, from binary computer instructions to source code to ordinary text to images, are repetitious to varying degrees. The basic idea of a sliding window is to imagine a window of some width immediately preceding the current position in the data stream (and therefore sliding along as the current position is updated), which one can use as a kind of dictionary to encode subsequent data. For example, if the text of this chapter is the data stream, then the current position at the very instant you read this is here. Preceding that point is a little more than 13,000 bytes of text, which includes, among other things, six copies of the fragment ``or example'' (whoa, there's another one!). Instead of encoding such strings as literal text, deflate replaces each with a pair of numbers indicating its length (in this case, 10 bytes) and the distance back to one of the previous instances (perhaps 950 bytes between the fifth and sixth). The greater the length of the string, the greater the savings in encoding it as a pointer into the window.

There are various ways to implement LZ77; the approach used by deflate is a ``greedy'' algorithm originally devised by James Storer and Thomas Szymanski--hence its name, LZSS. LZSS employs a look-ahead buffer and finds the longest match for the buffer within the sliding window. If the match exceeds a given threshold length, the string is encoded as a length/distance pair and the buffer is advanced a corresponding amount. If the longest match is not sufficiently long, the first character in the look-ahead buffer is output as a literal value, and the buffer is advanced by one. Either way, the algorithm continues by seeking the longest match for the new contents of the buffer.

The deflate algorithm is actually a bit more clever than the preceding description would suggest. Rather than simply storing the length/distance pairs and literal bytes as is, it further compresses the data by Huffman-encoding the LZ77 output. This approach is generically referred to as LZH; deflate's uniqueness lies in its method of combining literals and lengths into a single Huffman tree, its use of both fixed and dynamic Huffman codes, and its division of the output stream into blocks so that regions of incompressible data can be stored as is, rather than expanding significantly, as can happen with the LZW algorithm.

The PNG specification further dictates that the deflate data stream must conform to the zlib 1.0 format. In particular, the size of the sliding window must be a power of 2 between 256 bytes and 32 kilobytes, inclusive, and a small zlib header and trailer are required. The latter includes a 32-bit checksum on the uncompressed data; recall that the compressed stream is already covered by PNG's 32-bit CRC value in each IDAT chunk.

More detailed explanation of the deflate algorithm and the zlib data format is beyond the scope of this book, but the full zlib and deflate specifications are available from http://www.zlib.org/zlib_docs.html . In addition, a reference such as The Data Compression Book, by Mark Nelson and Jean-loup Gailly, is invaluable for understanding many compression algorithms, including LZ77 and LZSS.

Practically speaking, independent implementation of the deflate algorithm is both difficult and unnecessary. Almost every PNG implementation available today makes use of the freely available zlib compression library, and the examples in Part III, Programming with PNG, do so as well.[71] For now I merely note that zlib supports ten compression levels (including one with no compression at all), differing in the algorithms used to find matching strings and in the thresholds for terminating the search prematurely.

[71] Nevertheless, at least one alternative (in C++) is available as part of Colosseum Builders' Image Library, and it is also described in a book by John Miano, The Programmer's Guide to Compressed Image Files.

As an aside, note that the efficiency of the compression engine increases as more data is processed, with peak efficiency being reached when there is sufficient data to fill the sliding window. This occurs mainly because there are fewer strings available in the ``dictionary,'' but also, initially, because those strings that do exist are limited in length--obviously, they cannot be any longer than the amount of data in the window. Thus, for example, when 25% of the compressed data has been received, it may correspond to only 20% of the pixels. But because of data buffering in network protocols and applications, any large disparities due to the truly low-efficiency encoding at startup will tend to be washed out at the 512-byte level (or higher). That is, even though the first 50 bytes might represent only 1% compression, those bytes generally will not be available until after the 512th byte has been received, by which point the compression efficiency may have reached 10% or better. And since this is generally true of most compression algorithms, including those used by both GIF and PNG, it is reasonable to compare (as I did in Chapter 1, "An Introduction to PNG") the appearance of the uncompressed pixels at an instant when equal amounts of compressed data have been received.

A Final Word on Patents

As mentioned at the end of Chapter 7, "History of the Portable Network Graphics Format", Stac has reportedly claimed that the deflate algorithm is covered by two of their patents. In fact, there are a number of patents that can be infringed upon by a compliant deflate implementation, including one held by PKWARE itself that involves sorted hash tables. But the deflate specification includes a section on implementing the algorithm without infringing,[72] and, of course, zlib itself follows that prescription. While these things are never 100% certain unless and until they are tested in court, developers and users can be reasonably confident that the use of zlib and its implementation of the deflate algorithm is not subject to licensing fees.

[72] From Section 4 of the deflate specification, Compression algorithm details: ``...it is strongly recommended that the implementor of a compressor follow the general algorithm presented here, which is known not to be patented per se.''




Last Update: 2010-Nov-26