The PNG Guide is an eBook based on Greg Roelofs' book, originally published by O'Reilly.


We'll look at the compression engine itself shortly, but PNG's performance is not due solely to an improved compression algorithm. PNG also supports a precompression step called filtering. Filtering is a method of reversibly transforming the image data so that the main compression engine can operate more efficiently. As a simple example, consider a sequence of bytes increasing uniformly from 1 to 255. Since there is no repetition in the sequence, it compresses either very poorly or not at all. But a trivial modification of the sequence--namely, leaving the first byte alone but replacing each subsequent byte by the difference between it and its predecessor--transforms the sequence into an extremely compressible set of 255 identical bytes, each having the value 1.

As a real-life example of this (though still not particularly realistic), consider the image known as 16million.png. This 24-bit, 512 × 32,768 RGB image contains one pixel of every possible color--more than 16 million of them altogether. As raw data, it therefore requires 48 MB to store. Simple PNG-style compression with no filtering brings it down to 36 MB, only a 25% reduction in size. But with filtering turned on, the same compression engine reduces it to 115,989 bytes, more than 300 times better than the nonfiltered case, for a total compression factor of 434![68] Zowie.

[68] Actually, it gets even better. The dimensions of the image were chosen for convenient web-browser scrolling, but a 4096 × 4096 version created by Paul Schmidt is half the size--a mere 59,852 bytes (841 times compression). And just wait until we get to the chapter on MNG!

Actual image data is rarely that perfect, but filtering does improve compression in grayscale and truecolor images, and it can help on some palette images as well. PNG supports five types of filters, and an encoder may choose to use a different filter for each row of pixels in the image. Table 9-1 lists the five filter types.

Table 9-1. PNG Filter Types

 Name  Description
 None  Each byte is unchanged.

Each byte is replaced with the difference between it and the ``corresponding byte'' to its left.


Each byte is replaced with the difference between it and the byte above it (in the previous row, as it was before filtering).


Each byte is replaced with the difference between it and the average of the corresponding bytes to its left and above it, truncating any fractional part.


Each byte is replaced with the difference between it and the Paeth predictor of the corresponding bytes to its left, above it, and to its upper left.

The last method requires some explanation. Invented by Alan Paeth, the Paeth predictor is computed by first calculating a base value, equal to the sum of the corresponding bytes to the left and above, minus the byte to the upper left. (For example, the base value might equal 228 + 228 - 227 = 229.) Then the difference between the base value and each of the three corresponding bytes is calculated, and the byte that gave the smallest absolute difference--that is, the one that was closest to the base value--is used as the predictor and subtracted from the target byte to give the filtered value. In case of ties, the corresponding byte to the left has precedence as the predicted value, followed by the one directly above. Note that all calculations to produce the Paeth predictor are done using exact integer arithmetic. The final filter calculation, on the other hand, is done using base-256 modular arithmetic; this is true for all of the filter types.

Though the concept is simple, there are quite a few subtleties in the actual mechanics of filtering. Most important among these is that filtering always operates on bytes, not pixels. For images with pixels smaller than eight bits, this means that the filter algorithms actually operate on more than one pixel at a time; for example, in a 2-bit palette or grayscale image, there are four pixels per byte. This approach improves the efficiency of decoders by avoiding bit-level manipulations.

At the other end of the spectrum, large pixels (e.g., 24-bit RGB or 64-bit RGBA) are also operated on as bytes, but only corresponding bytes are compared. For any given byte, the corresponding byte to its left is the one offset by the number of bytes per pixel. This means that red bytes in a truecolor image are compared with red bytes, green with green, and blue with blue. If there's an alpha channel, the alpha bytes are always compared; if the sample depth is 16 bits, upper (most significant) bytes are compared with upper bytes in the same color channel, and lower bytes are compared with lower. In other words, similar values will always be compared and operated on, in hopes of improving compression efficiency. Consider an RGB image, for example; the red, green, and blue values of any given pixel may be quite different, but neighboring pairs of red, green, and blue will often be similar. Thus the transformed bytes will tend to be close to zero even if the original bytes weren't. This is the real point of filtering: most of the transformed bytes will cluster around zero, thus giving the compression engine a smaller, more predictable range of byte values to cope with.

What about edges? If the ``corresponding byte'' to the left or above doesn't exist, the algorithm does not wrap around and use bytes from the other side of the image; instead, it treats the missing byte as zero. The wraparound method was, in fact, considered, but aside from the fact that one cannot wrap the top edge of the image without completely breaking the ability to stream and progressively display a PNG image, the designers felt that only a few images would benefit (and minimally, at that), which did not justify the potential additional complexity.

Interlacing is also a bit of a wrench in the works. For the purposes of filtering, each interlace pass is treated as a separate image with its own width and height. For example, in a 256 × 256 interlaced image, the passes would be treated as seven smaller images with dimensions 32 × 32, 32 × 32, 64 × 32, 64 × 64, 128 × 64, 128 × 128, and 256 × 128, respectively.[69] This avoids the nasty problem of how to define corresponding bytes between rows of different widths.

[69] Yes, that adds up to the right number of pixels. (Go ahead, add it up.) Note that things may not come out quite so cleanly in cases in which one or both image dimensions are not evenly divisible by eight; the width of each pass is rounded up, if necessary.

So how does an encoder actually choose the proper filter for each row? Testing all possible combinations is clearly impossible: even a 20-row image would require testing over 95 trillion combinations, where ``testing'' would involve filtering and compressing the entire image. A simpler approach, though still computationally expensive, is to incrementally test-compress each row, save the smallest result, and repeat for the next row. This amounts to filtering and compressing the entire image five times, which may be a reasonable trade-off for an image that will be transmitted and decoded many times.

But users often have barely enough patience to wait for a single round of compression, so the PNG development group has come up with a few rules of thumb (or heuristics) for choosing filters wisely. The first rule is that filters are rarely useful on palette images, so don't even bother with them. Note, however, that one has considerable freedom in choosing how to order entries in the palette itself, so it is possible that a particular method of ordering would actually result in image data that benefits significantly from filtering. No one has yet proven this, however, and the most likely approaches would involve doing statistics on every pair of pixels in the image. Such approaches would be quite expensive for larger images.

Filters are also rarely useful on low-bit-depth (grayscale) images, although there have been rare cases in which promoting such an image to 8 bits and then filtering has been effective. In general, however, filter type None is best.

For grayscale and truecolor images of 8 or more bits per sample, with or without alpha channels, dynamic filtering is almost always beneficial. The approach that has by now become standard is known as the minimum sum of absolute differences heuristic and was first proposed by Lee Daniel Crocker in February 1995. In this approach, the filtered bytes are treated as signed values--that is, any value over 127 is treated as negative; 128 becomes -128 and 255 becomes -1. The absolute value of each is then summed, and the filter type that produces the smallest sum is chosen. This approach effectively gives preference to sequences that are close to zero and therefore is biased against filter type None.

A related heuristic--still experimental at the time of this writing--is to use the weighted sum of absolute differences. The theory, to some extent based on empirical evidence, is that switching filters too often can have a deleterious effect on the main compression engine. A better approach might be to favor the most recently used filter even if its absolute sum of differences is slightly larger than that of other filters, in order to produce a more homogeneous data stream for the compressor--in effect, to allow short-term losses in return for long-term gains. The standard PNG library contains code to enable this heuristic, but a considerable amount of experimentation is yet to be done to determine the best combination of weighting factors, compression levels, and image types.

One can also imagine heuristics involving higher-order distance metrics (e.g., root-mean-square sums), sliding averages, and other statistical methods, but to date there has been little research in this area. Lossless compression is a necessity for many applications, but cutting-edge research in image compression tends to focus almost exclusively on lossy methods, since the payoff there is so much greater. Even within the lossless domain, preconditioning the data stream is likely to have less effect than changing the back-end compression algorithm itself. So let's take a look at that next.

Last Update: 2010-Nov-26