Filtering
We'll look at the compression engine itself shortly, but PNG's performance
is not due solely to an improved compression algorithm. PNG also supports a
precompression step called filtering.
Filtering is a method of reversibly transforming the image data so that the
main compression engine can operate more efficiently. As a simple example,
consider a sequence of bytes increasing uniformly from 1 to 255. Since there
is no repetition in the sequence, it compresses either very poorly or not at
all. But a trivial modification of the sequence--namely, leaving the first
byte alone but replacing each subsequent byte by the difference between it
and its predecessor--transforms the sequence into an extremely compressible
set of 255 identical bytes, each having the value 1.
As a real-life example of this (though still not particularly realistic),
consider the image known as 16million.png. This
24-bit, 512 × 32,768 RGB image contains one pixel of every
possible color--more than 16 million of them altogether. As raw data, it
therefore requires 48 MB to
store. Simple PNG-style compression with no filtering brings it down to
36 MB, only a 25% reduction in size. But with filtering turned on, the same
compression engine reduces it to 115,989 bytes, more than 300 times better
than the nonfiltered case, for a total compression factor of 434![68]
Zowie.
Actual image data is rarely that perfect, but filtering does improve
compression in grayscale and truecolor images, and it can help on some
palette images as well. PNG supports five types of filters, and an encoder
may choose to use a different filter for each row of pixels in the image.
Table 9-1 lists the five filter types.
Table 9-1.
PNG Filter Types
|
Name |
Description |
None |
Each byte is unchanged. |
Sub |
Each byte is replaced with the difference between it and the
``corresponding byte'' to its left. |
Up |
Each byte is replaced with the difference between it and the byte
above it (in the previous row, as it was before filtering).
|
Average |
Each byte is replaced with the difference between it and the average
of the corresponding bytes to its left and above it, truncating any
fractional part.
|
Paeth |
Each byte is replaced with the difference between it and the Paeth
predictor of the corresponding bytes to its left, above it, and to
its upper left.
|
The last method requires some explanation. Invented by Alan Paeth, the
Paeth predictor is computed by first calculating a base value, equal
to the sum of the corresponding bytes to the left and above, minus the byte
to the upper left. (For example, the base value might equal
228 + 228 - 227 = 229.) Then the difference between the base value
and each of the three corresponding bytes is calculated, and the byte that
gave the smallest absolute difference--that is, the one that was closest
to the base value--is used as the predictor and subtracted from the target
byte to give the filtered value. In case of ties, the corresponding byte
to the left has precedence as the predicted value, followed by the one
directly above. Note that all calculations to produce the Paeth predictor
are done using exact integer arithmetic. The final filter calculation, on
the other hand, is done using base-256 modular arithmetic; this is true for
all of the filter types.
Though the concept is simple, there are quite a few subtleties in the actual
mechanics of filtering. Most important among these is that filtering always
operates on bytes, not pixels. For images with pixels smaller than eight bits,
this means that the filter algorithms actually operate on more than one pixel
at a time; for example, in a 2-bit palette or grayscale image, there are
four pixels per byte. This approach improves the efficiency of decoders
by avoiding bit-level manipulations.
At the other end of the spectrum, large pixels (e.g., 24-bit RGB or 64-bit
RGBA) are also operated on as bytes, but only corresponding bytes are
compared. For any given byte, the corresponding byte to its left is the
one offset by the number of bytes per pixel. This means that red bytes in a
truecolor image are compared with red bytes, green with green, and blue with
blue. If there's an alpha channel, the alpha bytes are always compared;
if the sample depth is 16 bits, upper (most significant) bytes are compared
with upper bytes in the same color channel, and lower bytes are compared with
lower. In other words, similar values will always be compared and operated
on, in hopes of improving compression efficiency. Consider an RGB image,
for example; the red, green, and blue values of any given pixel may be quite
different, but neighboring pairs of red, green, and blue will often be
similar. Thus the transformed bytes will tend to be close to zero even if
the original bytes weren't. This is the real point of filtering: most of
the transformed bytes will cluster around zero, thus giving the compression
engine a smaller, more predictable range of byte values to cope with.
What about edges? If the ``corresponding byte'' to the left or above doesn't
exist, the algorithm does not wrap around and use bytes from the other side
of the image; instead, it treats the missing byte as zero. The wraparound
method was, in fact, considered, but aside from the fact that one cannot wrap
the top edge of the image without completely breaking the ability to stream
and progressively display a PNG image, the designers felt that only a few
images would benefit (and minimally, at that), which did not justify the
potential additional complexity.
Interlacing is also a bit of a wrench in the works. For the purposes of
filtering, each interlace pass is treated as a separate image with its own
width and height. For example, in a 256 × 256 interlaced image, the
passes would be treated as seven smaller images with dimensions 32 × 32,
32 × 32, 64 × 32, 64 × 64, 128 × 64, 128 × 128, and
256 × 128, respectively.[69]
This avoids the nasty problem of how to define corresponding bytes between
rows of different widths.
So how does an encoder actually choose the proper filter for each row?
Testing all possible combinations is clearly impossible: even a 20-row
image would require testing over 95 trillion combinations, where ``testing''
would involve filtering and compressing the entire image. A simpler approach,
though still computationally expensive, is to incrementally test-compress each
row, save the smallest result, and repeat for the next row. This amounts to
filtering and compressing the entire image five times, which may be a
reasonable trade-off for an image that will be transmitted and decoded many
times.
But users often have barely enough patience to wait for a single round of
compression, so the PNG development group has come up with a few rules of thumb
(or heuristics) for choosing filters wisely. The first rule is that filters
are rarely useful on palette images, so don't even bother with them. Note,
however, that one has considerable freedom in choosing how to order entries
in the palette itself, so it is possible that a particular method of ordering
would actually result in image data that benefits significantly from filtering.
No one has yet proven this, however, and the most likely approaches would
involve doing statistics on every pair of pixels in the image. Such
approaches would be quite expensive for larger images.
Filters are also rarely useful on low-bit-depth (grayscale) images, although
there have been rare cases in which promoting such an image to 8 bits and then
filtering has been effective. In general, however, filter type None is best.
For grayscale and truecolor images of 8 or more bits per sample, with or
without alpha channels, dynamic filtering is almost always beneficial. The
approach that has by now become standard is known as the minimum sum of
absolute differences heuristic and was first proposed by Lee Daniel Crocker
in February 1995. In this approach, the filtered bytes are treated as signed
values--that is, any value over 127 is treated as negative; 128 becomes
-128 and 255 becomes -1. The absolute value of each is then summed, and
the filter type that produces the smallest sum is chosen. This approach
effectively gives preference to sequences that are close to zero and
therefore is biased against filter type None.
A related heuristic--still experimental at the time of this writing--is
to use the weighted sum of absolute differences. The theory, to some
extent based on empirical evidence, is that switching filters too often can
have a deleterious effect on the main compression engine. A better approach
might be to favor the most recently used filter even if its absolute sum of
differences is slightly larger than that of other filters, in order to produce
a more homogeneous data stream for the compressor--in effect, to allow
short-term losses in return for long-term gains. The standard PNG library
contains code to enable this heuristic, but a considerable amount of
experimentation is yet to be done to determine the best combination of
weighting factors, compression levels, and image types.
One can also imagine heuristics involving higher-order distance metrics (e.g.,
root-mean-square sums), sliding averages, and other statistical methods, but
to date there has been little research in this area. Lossless compression is
a necessity for many applications, but cutting-edge research in image
compression tends to focus almost exclusively on lossy methods, since the
payoff there is so much greater. Even within the lossless domain,
preconditioning the data stream is likely to have less effect than changing
the back-end compression algorithm itself. So let's take a look at that
next.
|