The PNG Guide is an eBook based on Greg Roelofs' book, originally published by O'Reilly.



Transfer Functions and Gamma

To understand the solutions, one must first become acquainted with the problems. I won't attempt to cover the subject in detail; an entire book could be written on it--and, indeed, Charles Poynton has done just that. But I will give a brief overview of the main issues and explain how some of the features of the Portable Network Graphics format fit into the picture. I may even mention some physics and an equation or three, but you shouldn't need a technical degree to be able to understand the basic ideas.

The ultimate goal of the entire process is for the light that leaves your monitor to produce the same perception as the light that originally entered the camera would have if it had entered your eyeballs instead. Alternatively, for images created with an image-editing application, the goal is for your display to produce the same perception (and basically the same light) as the artist's monitor produced while he was creating the image. Clearly this involves both the encoding process performed by the editor or conversion program that writes the image file, and the decoding process, perfromed by the viewer or browser that reads and displays the image, as well as aspects of human physiology and psychology. We'll refer to the combination of the encoding and decoding processes as the end-to-end process. PNG's role is to provide a way to store not only the image samples, that is, the color components of each pixel but also the information needed to relate those samples to the desired output of the display. A decoder that has both that information and knowledge of how the user's display system behaves can then deduce how the image samples must be transformed in order to produce the correct output.

Storing the image samples themselves is easy. The tricky part is figuring out the two additional pieces of critical information: when encoding, how the original light is related to the samples, and when decoding, how image samples are related to the display's actual output (i.e., the reproduced light). The fundamental problem is that working with and storing light is nearly impossible; instead, light is typically converted to electrical signals. Indeed, there are several more conversions along the way, each of which potentially modifies the data in some way.

As a concrete example, in an image captured via a video or electronic camera, light entering the camera is first converted to analog voltages, which are in turn converted to other voltages representing digital ones and zeros. These are stored in an image file as magnetic fields on a hard disk or as tiny pits on a CD-ROM. For display, the digital data in the file is optionally modified by the viewing application (this is where gamma correction and other tweaking is performed), then possibly converted again according to a lookup table (LUT), then generally converted by a graphics card (``frame buffer'') back to an analog electrical signal.[77] This analog signal is then converted by the monitor's electronics into a directed beam of electrons that excites various phosphors at the front of the monitor and thereby is converted back into light. Clearly, there is a bit of complexity here (no pun intended).

[77] Early PC graphics cards (the ``CGA'' and ``EGA'' adapters, for example) communicated with the monitor digitally. Ironically, the burgeoning popularity of flat-panel displays and digital television is driving manufacturers back to using digital links between the frame buffer and display. As of early 1999, the standards and products were rare to nonexistent, but they're coming.

But all is not lost! One can simplify this model in several ways. For example, conversions from analog to digital and from digital to analog are well behaved--they introduce minimal artifacts--so they can be ignored. Likewise, the detailed physics of the monitor's operation, from electrical signal to high-voltage electric fields to electrons to light, also can be ignored; instead, the monitor can be treated as a black box that converts an electrical signal to light in a well-defined way. But the greatest simplification is yet to come. Each of the conversions that remain, in the camera, lookup table, and monitor, is represented mathematically by something called a transfer function. A transfer function is nothing more than a way to describe the relationship between what comes out of the conversion and what went into it, and it can be a fairly complex little beastie. The amazing thing is that each of the preceding conversions can almost always be approximated rather well by a very simple transfer function:

output = inputexponent

where the output and input values are scaled to the range between 0 and 1. The two scaling factors may be different, even if ``input'' and ``output'' both refer to light; for example, monitors are physically incapable of reproducing the brightness of actual daylight. Even better, since the output of one conversion is the input to the next, these transfer functions combine in a truly simple fashion:

final output = ((inputexponent1)exponent2)exponent3 = inputexponent1*exponent2*exponent3

This example happens to use three transfer functions, but the relation holds for any number of them. And the best part of all is that our ultimate goal, to have the final, reproduced output light be perceived the same as the original input light, is equivalent to the following trivial equation:

exponent1*exponent2*exponent3 = constant

Or in English: all of the exponents, when multiplied together, must equal a single, constant number. The value of the constant depends on the environments in which the image is captured and viewed, but for movies and slides projected in a dark room, it is usually around 1.5, and for video images shown in typical television or computer environments, it is usually about 1.14. Since the viewing application has the freedom to insert its own conversion with its own exponent, it could, in principle, ensure that the equation holds--if it knew what all the remaining exponents were. But in general, it lacks that knowledge. We'll come back to that in a moment.

In practice, images may be created with any number of tools: an electronic camera; the combination of a classic film-based camera, commercial developing process, and electronic scanner; an image-editing application; or even a completely artificial source such as a ray-tracing program, VRML browser, or fractal generator. To a viewing application, a file is a file; there is rarely any obvious clue as to the true origins of the image. In other words, the decoder can have no reasonable expectation of divining any of the transfer functions that came before the image data was saved to a file, even if it asks the user for help. The decoder's sole concern must therefore be the conversion of samples in the image file to the desired output on the display.

We'll come back and deal with encoders in a little while. For a decoder there are only two cases: either the file contains the additional information about how the samples are related to the desired output, or it doesn't. In the latter case, the decoder is no worse off than it would have been when dealing with a GIF or JPEG image; it can only make a guess about the proper conversion, which in most cases means it does nothing special.

But the case in which the file does contain conversion information is where things finally get interesting. Many types of conversion information are possible, but the simplest is a single number that is usually referred to as gamma. Gamma is a Greek letter (γ) that traditionally represents the exponent in the first equation I gave; the only problem is that, as we've seen, there are several exponents in the end-to-end process, and different people use the term ``gamma'' to mean different things. I will use ``gamma'' to refer to the exponent relating the image data and the desired display output. Not surprisingly, this is how PNG's gAMA chunk defines gamma, too.[78]

[78] Version 1.0 of the PNG specification discussed gamma in terms of the end-to-end transfer function from source to final display. This was deemed impractical and not necessarily indicative of real-world practice, so version 1.1 of the specification clarified all of the gamma-related discussion and reserved the actual term ``gamma'' solely for the usage described here.




Last Update: 2010-Nov-26