The PNG Guide is an eBook based on Greg Roelofs' book, originally published by O'Reilly.



Chunks

Chunks were designed to be easily tested and manipulated by computer programs, easily detected by human eyes, and reasonably self-contained. Every chunk has the same structure: a 4-byte length (in ``big-endian'' format, as with all integer values in PNG streams), a 4-byte chunk type, between 0 and 2,147,483,647 bytes of chunk data, and a 4-byte cyclic redundancy check value (CRC). This is diagrammed in Figure 8-1.

Figure 8-1

Figure 8-1: PNG chunk structure.

The data field is straightforward; that's where the interesting bits (if any) go; specific content will be discussed later, as each chunk is described. The length field refers to the length of the data field alone, not the chunk type or CRC. The CRC, on the other hand, covers both the chunk-type field and the chunk data and is always present, even when there is no chunk data. Note that the combination of length fields and CRC values is already sufficient to check the basic integrity of a PNG file! The only missing information--not including the contents of the first 8 bytes in the file--is the exact algorithm (or ``polynomial'') used for the CRC. That turns out to be identical to the CRC used by gzip and many popular archiving programs; it is described in detail in Section 3.4 of the PNG Specification, Version 1.1, available from http://www.libpng.org/pub/png/pngdocs.html.

The chunk type is possibly the most unusual feature. It is specified as a sequence of binary values, which just happen to correspond to the upper- and lowercase ASCII letters used on virtually every computer in the Western, non-mainframe world. Since it is far more convenient (and readable) to speak in terms of text characters than numerical sequences, the remainder of this book will adopt the convention of referring to chunks by their ASCII names. Programmers of EBCDIC-based computers should take note of this and remember to use only the numerical values corresponding to the ASCII characters.

Chunk types (or names) are usually mnemonic, as in the case of the IHDR or image header chunk. In addition, however, each character in the name encodes a single bit of information that shows up in the capitalization of the character.[56] Thus IHDR and iHDR are two completely different chunk types, and a decoder that encounters an unrecognized chunk can nevertheless infer useful things about it. From left to right, the four extra bits are interpreted as follows:

[56] The ASCII character set was conveniently designed so that the case of a letter is always determined by bit 5. To put it another way, adding 32 to an uppercase character code gives you the code for its lowercase version.

  • The first character's case bit indicates whether the chunk is critical (uppercase) or ancillary; a decoder that doesn't recognize the chunk type can ignore it if it is ancillary, but it must warn the user that it cannot correctly display the image if it encounters an unknown critical chunk. The tEXt chunk, covered in Chapter 11, "PNG Options and Extensions", is an example of an ancillary chunk.

  • The second character indicates whether the chunk is public (uppercase) or private. Public chunks are those defined in the specification or registered as official, special-purpose types. But a company may wish to encode its own, application-specific information in a PNG file, and private chunks are one way to do that.

  • The case bit of the third character is reserved for use by future versions of the PNG specification. It must be uppercase for PNG 1.0 and 1.1 files, but a decoder encountering an unknown chunk with a lowercase third character should deal with it as with any other unknown chunk.

  • The last character's case bit is intended for image editors rather than simple viewers or other decoders. It indicates whether an editing program encountering an unknown ancillary chunk[57] can safely copy it into the new file (lowercase) or not (uppercase). If an unknown chunk is marked unsafe to copy, then it depends on the image data in some way. It must be omitted from the new image if any critical chunks have been modified in any way, including the addition of new ones or the reordering or deletion of existing ones. Note that if the program recognizes the chunk, it may choose to modify it appropriately and then copy it to the new file. Also note that unsafe-to-copy chunks may be copied to the new file if only ancillary chunks have been modified--again, including addition, deletion, and reordering--which implies that ancillary chunks cannot depend on other ancillary chunks.

    [57] Since any decoder encountering an unknown critical chunk has no idea how the chunk modifies the image--only that it does so in a critical way--an editor cannot safely copy or omit the chunk in the new image.




Last Update: 2010-Nov-26