The PNG Guide is an eBook based on Greg Roelofs' book, originally published by O'Reilly.



Interlacing and Progressive Display

We'll wrap up our look at the basic elements of Portable Network Graphics images with a quick consideration of progressive rendering and interlacing. Most computer users these days are familiar with the World Wide Web and the method by which modern browsers present pages. As a rule, the textual part of a web page is displayed first, since it is transmitted as part of the page; then images are displayed, with each one rendered as it comes across the network. Ordinary images are simply painted from the top down, a few lines at a time; this is the most basic form of progressive display.

Some images, however, are in a format that allows them to be rendered as an overall, low-resolution image first, followed by one or more passes that refine it until the complete, full-resolution image is displayed. For GIF and PNG images this is known as interlacing. GIF's approach has four passes and is based on complete rows of the image, making it a one-dimensional method. First every eighth row is displayed; then every eighth row is displayed again, only this time offset by four rows from the initial pass. The third pass consists of every fourth row, and the final pass includes every other row (half of the image).

PNG's interlacing method, on the other hand, is a two-dimensional scheme with seven passes, known as the Adam7 method (after its inventor, Adam Costello). If one imagines the image being broken up into 8 × 8-pixel tiles, then the first pass consists of the upper left pixel in each tile--that is, every eighth pixel, both vertically and horizontally. The second pass also consists of every eighth pixel, but offset four pixels to the right.

Figure 8-4

Figure 8-4: Schematic of an 8 × 8 tile (a) after the third pass and (b) after the fifth pass.

The third pass consists of two pixels per tile, offset by four rows from the first two pixels (see Figure 8-4a). The fourth pass contains four pixels in each tile, offset two columns to the right of each of the first four pixels, and the fifth pass contains eight pixels, offset two rows downward (see Figure 8-4b). The sixth pass fills in the remaining pixels on the odd rows (if the image is numbered starting with row one), and the seventh pass contains all of the pixels for the even rows. Note that, although I've described the method in terms of 8 × 8 tiles, pixels for any given pass are stored as complete rows, not as tiled groups. For example, the fifth pass consists of every other pixel in the entire third row of the image, followed by every other pixel in the seventh row, and so on.

The primary benefit of PNG's two-dimensional interlacing over GIF's one-dimensional scheme is that one can view a crude approximation of the entire image roughly eight times as fast.[66] That is, PNG's first pass consists of one sixty-fourth of the image pixels, whereas GIF's first pass consists of one-eighth of the data. Suppose one were to save a palette image as both an interlaced GIF and an interlaced PNG. Assuming the compression ratio and download speeds were identical for the two files, the PNG image would have completed its fourth pass as the GIF image completed its first. But most browsers that support progressive display do so by replicating pixels to fill in the areas that haven't arrived yet. For the PNG image, that means each pixel at this stage represents a 2 × 4 block, whereas each GIF pixel represents a 1 × 8 strip. In other words, GIF pixels have an 8-to-1 aspect ratio, whereas PNG pixels are 2-to-1. At the end of the next pass for each format (GIF's second pass, PNG's fifth; one-quarter of the image in both cases), the PNG pixels are square 2 × 2 blocks, while the GIF pixels are still stretched, now as 1 × 4 strips. In practical terms, features in the PNG image--particularly embedded text--are much more recognizable than in the GIF image. In fact, readability testing suggests that text of any given size is legible roughly twice as fast with PNG's interlacing method.

[66] As I (foot)noted in Chapter 1, "An Introduction to PNG", this implicitly assumes that one-eighth of the compressed data corresponds to one-eighth of the uncompressed (image) data, which is not quite accurate. The difference is likely to be small in most cases, however. I'll discuss this further in Chapter 9, "Compression and Filtering".

JPEG also supports a form of progressive display, but it is not interlacing in the usual sense of reordering the pixels spatially. Rather, it involves reordering the frequency components that make up a JPEG image, first displaying the low-frequency ones and working up to the highest frequency band; this is known as spectral selection. In addition, progressive JPEG can transmit the most significant bits of each frequency component earlier than the less significant ones, a feature known as successive approximation that is very nearly the same as turning up the JPEG quality setting with each scan. The two approaches can be used separately, but in practice they are almost always used in combination. Because JPEG operates on 8 × 8 blocks of pixels, progressive JPEG bears a strong resemblance to interlaced PNG during the early stages of display, though it tends to have a softer, fuzzier look due to the initial lack of high-frequency components (which is often deliberately enhanced by smoothing in the decoder). This is visible in Figures C-4a and C-4b in the color insert, which represent the second pass of a progressive JPEG image (26% of the compressed data), both unsmoothed and smoothed. Note in particular the blockiness in the shadowed interior of the box and the ``colored outside the lines'' appearance around the child's arms and hands; the first effect is completely eliminated in the smoothed version, and the second is greatly reduced. JPEG's first pass is actually more accurate than PNG's, however, since the low-frequency band for each 8 × 8 pixel block represents an average for all 64 pixels, whereas each 8 × 8 block in PNG's first pass is represented by a single pixel, usually in the upper left corner of the displayed block. By its fifth pass, which represents only 40% of the compressed data, the progressive JPEG version of this image (Figure C-4c) is noticeably sharper and more accurate than all but the final pass of the PNG version. Keep in mind also that, since the PNG is lossless and therefore 11 times as large as the JPEG, 40% of the compressed JPEG data is equivalent to only 3.5% of the PNG data, which corresponds to the beginning of PNG's third pass. This only emphasizes the point made previously: for non-transparent, photographic images on the Web, use JPEG.

Note that smoothing could be applied to the early passes of interlaced PNGs and GIFs, as well; tests suggest that this looks better for photographic images but maybe not as good for simple graphics. (On the other hand, recall that smoothing did seem to enhance the readability of early interlace passes in Figure 1-4.) As for representing blocks by the pixel in the upper left corner, it would be possible to replicate each pixel so that the original would lie roughly at the center of its clones, as long as some care were taken near the edges of the image. This would prevent the apparent shift in some features as later passes are displayed. But neither smoothing nor centered pixel replication is currently supported by the PNG reference library, libpng, as of version 1.0.3.

It is worth noting that TIFF can also support a kind of interlacing, although like everything about TIFF, it is much more arbitrary than either GIF's or PNG's method. Baseline TIFF includes the concept of strips, each of which may include one or more rows of image data though the number of rows per strip is constant. A list of offsets to each strip is embedded within the image, so in principle one could make each strip a row and do GIF-style line interlacing with any ordering one chose. But since TIFF's structure is fundamentally random access in nature, this approach would only work if one imposed certain restrictions on the locations of its internal directory, list of strip offsets, and actual strip data--that is, one would need to define a particular subformat of TIFF.

In addition, libtiff supports a TIFF extension called tiles, in which the image data is organized into rectangular regions instead of strips. Since the tile size can be arbitrary, one could define it to be 1 × 1 and then duplicate PNG's Adam7 interlacing scheme manually--or even extend it to 9, 11, or more passes. However, since every tile must have a corresponding offset in the TIFF image directory, doing something like this would at least double or triple the image size. Also, TIFF's compression methods apply only to individual strips or tiles, so there would be no real possibility of compression aside from reusing tiles in more than one location (that is, by having multiple tile offsets point at the same data). And, as with the strip approach, this would require restrictions on the internal layout of the file. Nevertheless, the capability does exist, at least theoretically.




Last Update: 2010-Nov-26