Overview of Image Properties

Before we dive right into some of PNG's more interesting features, it might be helpful to introduce (or review) some essential image concepts and take a quick look at a few older image formats. Those who are already familiar with the most basic features of computer images can skip directly to the next section.

There are two main formats for computer images: raster, based on colored dots, which are almost always stored in a rectangular array and are usually packed so close together that individual dots are no longer distinguishable, and vector, based on lines, circles, and other ``primitive'' elements that typically cover a sizable area and are easily distinguishable from one another. Many images can be represented in either format; indeed, any vector-based image can be approximated by a raster image (lots of dots), and one could easily (though tediously) simulate a raster image in vector format by converting each dot to a tiny box.

The whole point of having two classes of image formats--and, indeed, of having numerous individual file formats--is implicit in the old saying, ``Use the best tool for the job.'' Vector formats are appropriate for simple graphics and text, such as corporate logos, and their advantage is that they can be extremely compact and yet maintain perfect sharpness regardless of the size at which they are reproduced. But with the exception of pen-based plotters and some ancient vector-based displays, the end result is almost always a raster image.

For that reason, plus the fact that raster image formats are more common--and because PNG is one of them--we'll take a closer look at raster features. As I just noted, a raster image is composed of an array of dots, more commonly referred to as pixels (short for picture elements). One generally refers to a computer image's dimensions in terms of pixels; this is also often (though slightly imprecisely) known as its resolution. Some common image sizes are 640 × 480, 800 × 600, and 1024 × 768 pixels, which also happen to be common dimensions for computer displays.

In addition to horizontal and vertical dimensions, a raster image is characterized by depth. The deeper the image, the more colors (or shades of gray) it can have. Pixel depths are measured in bits, the tiniest units of computer storage; a 1-bit image can represent two colors (often, though not necessarily, black and white), a 2-bit image four colors, an 8-bit image 256 colors, and so on. To calculate the raw size of the image data before any compression takes place, one needs only to know that 8 bits make a byte. Thus a 320 × 240, 24-bit image has 76,800 pixels, each of which is 3 bytes deep, so its total uncompressed size is 230,400 bytes.

I'll return to the topic of compression in just a moment; first, let's take a closer look at the precise relationship between pixels and colors. Within the broad class of raster formats, there are three main image types: indexed-color, grayscale, and truecolor. The indexed-color method, also known as

pseudocolor, colormapped, or palette-based, stores a copy of each color value needed for the image in a palette. The main image is then composed of index values referring to different entries in the palette. For example, imagine an image composed entirely of red, white, and blue pixels; the palette would have three entries corresponding to these colors, and each pixel would be represented by the value 0, 1, or 2. (The natural starting point for numbers on a computer is 0, not 1.) Since an image 2 bits deep can represent up to four colors, each pixel in this example would require only 2 bits, even though the precise shades of red, white, and blue might ordinarily require 24 bits each.

Grayscale and truecolor images are simpler in concept; the bytes used by each pixel correspond directly to shades of gray or to colors. In a grayscale image of a particular pixel depth, a 0 pixel usually (though not always) means black, while the maximum value at that depth corresponds to white. Intermediate pixel values are smoothly interpolated to shades of gray, though this is often not as straightforward as it might sound--gamma correction, a way of adjusting for differences in computer display systems, comes in here. I'll give a brief overview of gamma correction later in this chapter, and I'll discuss it at length in Chapter 10, "Gamma Correction and Precision Color", Gamma Correction and Precision Color; for now, I'll merely note that it is a Good Thing, and image formats that provide support for it can be viewed on different platforms without appearing too light on one and too dark on another.

A truecolor image uses three separate values for each pixel, corresponding to shades of red, green, and blue. Such images are often also referred to as RGB. In Chapter 8, "PNG Basics", I'll talk about human vision and the reasons why mixtures of just three colors can appear to reproduce all colors, or at least a sufficiently large percentage of them that one need not quibble over the difference. I'll also mention some common alternatives to the RGB color space. To be considered truly truecolor instead of merely ``high color,'' an image must contain at least 8 bits for each of the three colors in each pixel; thus, at a minimum, a truecolor image has a depth of 24 bits.

Two other concepts--samples and channels--are handy when speaking of images, and RGB images are a good way to illustrate these concepts. A sample is one component of a single color value. For example, each pixel in a truecolor image consists of three samples: red, green, and blue. If the image is 24 bits deep, then each sample is 8 bits deep. A 256-shade grayscale image also has 8-bit samples, which means that one can speak of the ``bits per sample'' for either image type to indicate the level of precision of each shade or color. Note that I have been careful to distinguish between

sample depth and pixel depth. The two terms are directly related in grayscale and truecolor images, but in indexed-color images they can be independent of each other. This is because the sample depth refers to the color values in the palette, while the pixel depth refers to the index values of each pixel (which reference the palette colors). To put it more concretely, the color values in the palette are usually 24-bit values (8 bits per sample), but the pixel indices are usually 8 bits or less. Our previous red, white, and blue example used only two bits per pixel.

A channel, on the other hand, refers to the collection of all samples of a given type in an image--for example, the green components of every RGB pixel. Thus a truecolor image has three channels, while a grayscale image has only one. (Ordinarily one does not speak of a palette-based image as having channels.) And when discussing transparency, yet another channel type is often used: the

alpha channel. This is a special kind of channel in that it does not provide actual color information but rather a level of transparency for each pixel--or, more precisely, a level of

opacity, since it is most common for the maximum sample value to indicate that the pixel is completely opaque and for zero to indicate complete transparency. A truecolor image with an alpha channel is often called an RGBA image; grayscale images with alpha channels are rarer and don't have a special abbreviation (although I may refer to them as ``gray+alpha'').

Palette-based images almost never have a full alpha channel, but another type of transparency is possible. Rather than associate alpha information with every pixel, one can instead associate it with specific palette entries. By far the most common approach is to specify that a single palette entry represents complete transparency. Then when the image is displayed against some sort of background, any pixel whose index refers to this particular palette entry will be replaced by the background at the pixel's location--or perhaps the pixel simply will not be drawn in the first place. But there is no conceptual requirement that only one palette entry can have transparency, nor that it must be fully transparent. As we'll see shortly, PNG effectively allows any number of palette entries to have any level of transparency.

While we're on the subject of colormapped images, two other concepts are worth mentioning: quantization and dithering. Suppose one has a 24-bit truecolor image, but it must be displayed on a 256-color, palette-based display. Since truecolor images typically use anywhere from 10,000 to 100,000 colors, the conversion to a colormapped image will involve substituting many of the color values with a much smaller range of colors. This process is known as quantization. Because the resulting images have such a limited palette of colors available to them, they often are unable to represent fine color gradients such as the different shades of blue seen in the sky or the range of facial tones in a softly lit portrait. One way around this is to dither the image, which is a means of mixing pixels of the available colors together to give the appearance of other colors (though generally at the cost of some sharpness). For example, a checkerboard pattern of alternating red and yellow pixels might appear orange. This effect is perhaps best illustrated with an example. Figure 1-1 shows a truecolor photograph (here rendered in grayscale) together with two 256-color versions of the same image--one simply quantized to 256 colors and the other both quantized and dithered. The insets give a magnified view of one region, showing the relative effects of the two procedures.

Figure 1-1: (a) Original, 24-bit image; (b) same image after quantization, and (c) after quantization and dithering. (Click on images for full-scale, color versions.)

I'll round out our review of image properties and concepts with a quick look at compression. There are really only two flavors: lossless and lossy. Lossless compression preserves the exact image data down to the last bit, so that what you get out after uncompressing is exactly the same as what you started with. In contrast, lossy compression throws away some of the data in return for much better compression ratios. For photographic images, the best lossless methods may only manage a factor of two or three in compression, whereas lossy methods typically achieve anywhere from 8 to 25 times reduction with very little visible loss of quality. I'll discuss the details of compression, particularly the lossless variety, at greater length in Chapter 9, "Compression and Filtering".

Finally, in describing the advantages of PNG, I will necessarily compare it with some older image formats. Although there are literally hundreds of different formats, we will be most concerned with just three: GIF, JPEG, and TIFF. GIF, short for the Graphics Interchange Format, and JPEG, short for the Joint Photographic Experts Group (which defined the format), are both very common image types often seen on the Web. TIFF, on the other hand, short for Tagged Image File Format, is almost never used on the Web but is quite popular as an output format from scanners and as an intermediate ``save format'' while editing images. I'll touch on the properties of each of these formats as we go.