Portable Network Graphics (PNG) Specific

This clause gives requirements and recommendations for encoder behavior. A PNG encoder shall produce a PNG datastream
from a PNG image that conforms to the format specified in the preceding clauses. Best results will usually be achieved by
following the additional recommendations given here.
See
C.
Gamma and chromaticity
for a brief introduction to
gamma
issues.
PNG encoders capable of full color management will perform more sophisticated calculations than those described here and
may choose to use the
iCCP
chunk. If it is known that the image samples conform to the
sRGB specification [
SRGB
], encoders are strongly encouraged to write the
sRGB
chunk without performing additional
gamma
handling. In both cases it is
recommended that an appropriate
gAMA
chunk be generated for use by PNG decoders that do
not recognize the
iCCP
or
sRGB
chunks.
A PNG encoder has to determine:
what value to write in the
gAMA
chunk;
how to transform the provided image samples into the values to be written in the PNG datastream.
The value to write in the
gAMA
chunk is that value which causes a PNG decoder to
behave in the desired way. See
13.13
Decoder gamma handling
The transform to be applied depends on the nature of the image samples and their precision. If the samples represent light
intensity in floating-point or high precision integer form (perhaps from a computer graphics renderer), the encoder may
perform
gamma encoding
(applying a power function with exponent less than 1) before quantizing the data to integer
values for inclusion in the PNG datastream. This results in fewer banding artifacts at a given sample depth, or allows
smaller samples while retaining the same visual quality. An intensity level expressed as a floating-point value in the range
0 to 1 can be converted to a datastream image sample by:
integer_sample = floor((2
sampledepth
-1) * intensity
encoding_exponent
+ 0.5)
If the intensity in the equation is the desired output intensity, the encoding exponent is the
gamma value
to be
used in the
gAMA
chunk.
If the intensity available to the PNG encoder is the original scene intensity, another transformation may be needed. There
is sometimes a requirement for the displayed image to have higher contrast than the original source image. This corresponds
to an end-to-end
transfer function
from original scene to display output with an exponent greater than 1. In this
case:
gamma
encoding_exponent/end_to_end_exponent
If it is not known whether the conditions under which the original image was captured or calculated warrant such a
contrast change, it may be assumed that the display intensities are proportional to original scene intensities, i.e. the
end-to-end exponent is 1 and hence:
gamma
encoding_exponent
If the image is being written to a datastream only, the encoder is free to choose the encoding exponent. Choosing a value
that causes the
gamma value
in the
gAMA
chunk to be 1/2.2 is often a reasonable
choice because it minimizes the work for a PNG decoder displaying on a typical video monitor.
Some image renderers may simultaneously write the image to a PNG datastream and display it on-screen. The displayed pixels
should be
gamma
corrected for the display system and viewing conditions in use, so that the user sees a proper
representation of the intended scene.
If the renderer wants to write the displayed sample values to the PNG datastream, avoiding a separate
gamma
encoding
step for the datastream, the renderer should approximate the
transfer function
of the display system by a
power function, and write the reciprocal of the exponent into the
gAMA
chunk. This will
allow a PNG decoder to reproduce what was displayed on screen for the originator during rendering.
However, it is equally reasonable for a renderer to compute displayed pixels appropriate for the display device, and to
perform separate
gamma encoding
for data storage and transmission, arranging to have a value in the
gAMA
chunk more appropriate to the future use of the image.
Computer graphics renderers often do not perform
gamma encoding
, instead making sample values directly proportional
to scene light intensity. If the PNG encoder receives sample values that have already been quantized into integer values,
there is no point in doing
gamma encoding
on them; that would just result in further loss of information. The encoder
should just write the sample values to the PNG datastream. This does not imply that the
gAMA
chunk should contain a
gamma value
of 1.0 because the desired end-to-end
transfer function
from scene intensity to display output intensity is not necessarily linear. However, the desired
gamma value
is
probably not far from 1.0. It may depend on whether the scene being rendered is a daylight scene or an indoor scene, etc.
When the sample values come directly from a piece of hardware, the correct
gAMA
value
can, in principle, be inferred from the
transfer function
of the hardware and lighting conditions of the scene. In the
case of video digitizers ("frame grabbers"), the samples are probably in the sRGB color space, because the sRGB
specification was designed to be compatible with modern video standards. Image scanners are less predictable. Their output
samples may be proportional to the input light intensity since CCD sensors themselves are linear, or the scanner hardware may
have already applied a power function designed to compensate for dot gain in subsequent printing (an exponent of about 0.57),
or the scanner may have corrected the samples for display on a monitor. It may be necessary to refer to the scanner's manual
or to scan a calibrated target in order to determine the characteristics of a particular scanner. It should be remembered
that
gamma
relates samples to desired display output, not to scanner input.
Datastream format converters generally should not attempt to convert supplied images to a different
gamma
. The data
should be stored in the PNG datastream without conversion, and the
gamma value
should be deduced from information in
the source datastream if possible.
Gamma
alteration at datastream conversion time causes re-quantization of the set of
intensity levels that are represented, introducing further roundoff error with little benefit. It is almost always better to
just copy the sample values intact from the input to the output file.
If the source datastream describes the
gamma
characteristics of the image, a datastream converter is strongly
encouraged to write a
gAMA
chunk. Some datastream formats specify the display exponent
(the exponent of the function which maps image samples to display output rather than the other direction). If the source
file's
gamma value
is greater than 1.0, it is probably a display exponent, and the reciprocal of this value should be
used for the PNG
gamma value
. If the source file format records the relationship between image samples and a quantity
other than display output, it will be more complex than this to deduce the PNG
gamma value
If a PNG encoder or datastream converter knows that the image has been displayed satisfactorily using a display system
whose
transfer function
can be approximated by a power function with exponent
display_exponent
, the image
can be marked as having the
gamma value
gamma
/display_exponent
It is better to write a
gAMA
chunk with a value that is approximately correct than to
omit the chunk and force PNG decoders to guess an approximate
gamma value
. If a PNG encoder is unable to infer the
gamma value
, it is preferable to omit the
gAMA
chunk. If a guess has to be made
this should be left to the PNG decoder.
gamma
does not apply to alpha samples; alpha is always represented linearly.
See also
13.13
Decoder gamma handling
See
C.
Gamma and chromaticity
for references to color issues.
PNG encoders capable of full color management will perform more sophisticated calculations than those described here and
may choose to use the
iCCP
chunk. If it is known that the image samples conform to the
sRGB specification [
SRGB
], PNG encoders are strongly encouraged to use the
sRGB
chunk.
If it is possible for the encoder to determine the chromaticities of the source display primaries, or to make a strong
guess based on the origin of the image, or the hardware running it, the encoder is strongly encouraged to output the
cHRM
chunk. If this is done, the
gAMA
chunk should
also be written; decoders can do little with a
cHRM
chunk if the
gAMA
chunk is missing.
There are a number of recommendations and standards for primaries and
white points
, some of which are linked to
particular technologies, for example the CCIR 709 standard [
ITU-R-BT.709
] and the SMPTE-C standard [
SMPTE-170M
].
There are three cases that need to be considered:
the encoder is part of the generation system;
the source image is captured by a camera or scanner;
the PNG datastream was generated by translation from some other format.
In the case of hand-drawn or digitally edited images, it is necessary to determine what monitor they were viewed on when
being produced. Many image editing programs allow the type of monitor being used to be specified. This is often because they
are working in some device-independent space internally. Such programs have enough information to write valid
cHRM
and
gAMA
chunks, and are strongly encouraged to do so
automatically.
If the encoder is compiled as a portion of a computer image renderer that performs full-spectral rendering, the monitor
values that were used to convert from the internal device-independent color space to RGB should be written into the
cHRM
chunk. Any colors that are outside the gamut of the chosen RGB device should be
mapped to be within the gamut; PNG does not store out-of-gamut colors.
If the computer image renderer performs calculations directly in device-dependent RGB space, a
cHRM
chunk should not be written unless the scene description and rendering parameters have been adjusted for a
particular monitor. In that case, the data for that monitor should be used to construct a
cHRM
chunk.
A few image formats store calibration information, which can be used to fill in the
cHRM
chunk. For example, TIFF 6.0 files [
TIFF-6.0
] can optionally store calibration information, which if
present should be used to construct the
cHRM
chunk.
Video created with recent video equipment probably uses the CCIR 709 primaries and D65
white point
ITU-R-BT.709
], which are given in
Table
29
Table
29
CCIR 709 primaries and D65 whitepoint
White
0.640
0.300
0.150
0.3127
0.330
0.600
0.060
0.3290
An older but still very popular video standard is SMPTE-C [
SMPTE-170M
] given in
Table
30
Table
30
SMPTE-C video standard
White
0.630
0.310
0.155
0.3127
0.340
0.595
0.070
0.3290
It is
not
recommended that datastream format converters attempt to convert supplied images to a different
RGB color space. The data should be stored in the PNG datastream without conversion, and the source primary chromaticities
should be recorded if they are known. Color space transformation at datastream conversion time is a bad idea because of
gamut mismatches and rounding errors. As with
gamma
conversions, it is better to store the data losslessly and incur
at most one conversion when the image is finally displayed.
See
13.14
Decoder color handling
The alpha channel can be regarded either as a mask that temporarily hides transparent parts of the image, or as a means
for constructing a non-rectangular image. In the first case, the color values of fully transparent pixels should be
preserved for future use. In the second case, the transparent pixels carry no useful data and are simply there to fill out
the rectangular image area required by PNG. In this case, fully transparent pixels should all be assigned the same color
value for best compression.
Image authors should keep in mind the possibility that a decoder will not support transparency control in full (see
13.16
Alpha channel processing
). Hence, the colors assigned to transparent pixels should be reasonable
background colors whenever feasible.
For applications that do not require a full alpha channel, or cannot afford the price in compression efficiency, the
tRNS
transparency chunk is also available.
If the image has a known background color, this color should be written in the
bKGD
chunk. Even decoders that ignore transparency may use the
bKGD
color to fill unused
screen area.
If the original image has premultiplied (also called "associated") alpha data, it can be converted to PNG's
non-premultiplied format by dividing each sample value by the corresponding alpha value, then multiplying by the maximum
value for the image bit depth, and rounding to the nearest integer. In valid premultiplied data, the sample values never
exceed their corresponding alpha values, so the result of the division should always be in the range 0 to 1. If the alpha
value is zero, output black (zeroes).
When encoding input samples that have a sample depth that cannot be directly represented in PNG, the encoder shall scale
the samples up to a sample depth that is allowed by PNG. The most accurate scaling method is the linear equation:
output = floor((input * MAXOUTSAMPLE / MAXINSAMPLE) + 0.5)
where the input samples range from 0 to
MAXINSAMPLE
and the outputs range from 0 to
MAXOUTSAMPLE
(which is 2
sampledepth
-1).
A close approximation to the linear scaling method is achieved by "left bit replication", which is shifting the valid bits
to begin in the most significant bit and repeating the most significant bits into the open bits. This method is often faster
to compute than linear scaling.
Assume that 5-bit samples are being scaled up to 8 bits. If the source sample value is 27 (in the range from
0-31), then the original bits are:
4 3 2 1 0
---------
1 1 0 1 1
Left bit replication gives a value of 222:
7 6 5 4 3 2 1 0
----------------
1 1 0 1 1 1 1 0
|=======| |===|
| Leftmost Bits Repeated to Fill Open Bits
Original Bits
which matches the value computed by the linear equation. Left bit replication usually gives the same value as linear
scaling, and is never off by more than one.
A distinctly less accurate approximation is obtained by simply left-shifting the input value and filling the low order
bits with zeroes. This scheme cannot reproduce white exactly, since it does not generate an all-ones maximum value; the net
effect is to darken the image slightly. This method is not recommended in general, but it does have the effect of improving
compression, particularly when dealing with greater-than-8-bit sample depths. Since the relative error introduced by
zero-fill scaling is small at high sample depths, some encoders may choose to use it. Zero-fill shall
not
be
used for alpha channel data, however, since many decoders will treat alpha values of all zeroes and all ones as special
cases. It is important to represent both those values exactly in the scaled data.
When the encoder writes an
sBIT
chunk, it is required to do the scaling in such a way
that the high-order bits of the stored samples match the original data. That is, if the
sBIT
chunk specifies a sample depth of S, the high-order S bits of the stored data shall agree with the
original S-bit data values. This allows decoders to recover the original data by shifting right. The added low-order bits are
not constrained. All the above scaling methods meet this restriction.
When scaling up source
image data
, it is recommended that the low-order bits be filled consistently for all
samples; that is, the same source value should generate the same sample value at any pixel position. This improves
compression by reducing the number of distinct sample values. This is not a mandatory requirement, and some encoders may
choose not to follow it. For example, an encoder might instead dither the low-order bits, improving displayed image quality
at the price of increasing file size.
In some applications the original source data may have a range that is not a power of 2. The linear scaling equation still
works for this case, although the shifting methods do not. It is recommended that an
sBIT
chunk not be written for such images, since
sBIT
suggests that the original data range
was exactly 0..2
-1.
Suggested palettes may appear as
sPLT
chunks in any PNG datastream, or as a
PLTE
chunk in
truecolor
PNG datastreams. In either case, the suggested palette is not an
essential part of the
image data
, but it may be used to present the image on indexed-color display hardware.
Suggested palettes are of no interest to viewers running on
truecolor
hardware.
When an
sPLT
chunk is used to provide a suggested palette, it is recommended that the
encoder use the frequency fields to indicate the relative importance of the palette entries, rather than leave them all zero
(meaning undefined). The frequency values are most easily computed as "nearest neighbor" counts, that is, the approximate
usage of each RGBA palette entry if no dithering is applied. (These counts will often be available "for free" as a
consequence of developing the suggested palette.) Because the suggested palette includes transparency information, it should
be computed for the un-
composited
image.
Even for indexed-color images,
sPLT
can be used to define alternative reduced
palettes for viewers that are unable to display all the colors present in the
PLTE
chunk. If the
PLTE
chunk appears without the
bKGD
chunk in an image of
color type
6, the circumstances under which the palette was computed are unspecified.
An older method for including a suggested palette in a
truecolor
PNG datastream uses the
PLTE
chunk. If this method is used, the histogram (frequencies) should appear in a separate
hIST
chunk. The
PLTE
chunk does not include transparency information.
Hence for images of
color type
6 (
truecolor with alpha
), it is recommended that a
bKGD
chunk appear and that the palette and histogram be computed with reference to the image as it would appear
after compositing against the specified background color. This definition is necessary to ensure that useful palette entries
are generated for pixels having fractional alpha values. The resulting palette will probably be useful only to viewers that
present the image against the same background color. It is recommended that
PNG editors
delete or recompute the
palette if they alter or remove the
bKGD
chunk in an image of
color type
6.
For images of
color type
2 (
truecolor
), it is recommended that the
PLTE
and
hIST
chunks be computed with reference to the RGB data only,
ignoring any transparent-color specification. If the datastream uses transparency (has a
tRNS
chunk), viewers can easily adapt the resulting palette for use with their intended background color (see
13.17
Histogram and suggested palette usage
).
For providing suggested palettes, the
sPLT
chunk is more flexible than the
PLTE
chunk in the following ways:
With
sPLT
multiple suggested palettes may be provided. A PNG decoder may choose an
appropriate palette based on name or number of entries.
In a PNG datastream of
color type
6 (
truecolor with alpha
channel), the
PLTE
chunk represents a palette already
composited
against the
bKGD
color, so it is useful only for display against that background color. The
sPLT
chunk provides an un-
composited
palette, which is useful for display against backgrounds chosen
by the PNG decoder.
Since the
sPLT
chunk is an ancillary chunk, a
PNG editor
may add or modify
suggested palettes without being forced to discard unknown unsafe-to-copy chunks.
Whereas the
sPLT
chunk is allowed in PNG datastreams for
color types
0, 3,
and 4 (
greyscale
and
indexed-color
), the
PLTE
chunk cannot be used to provide reduced
palettes in these cases.
More than 256 entries may appear in the
sPLT
chunk.
A PNG encoder that uses the
sPLT
chunk may choose to write a suggested palette
represented by
PLTE
and
hIST
chunks as well, for
compatibility with decoders that do not recognize the
sPLT
chunk.
This specification defines two interlace methods, one of which is no interlacing. Interlacing provides a convenient basis
from which decoders can progressively display an image, as described in
13.10
Interlacing and progressive display
For images of
color type
3 (indexed-color), filter type 0 (None) is usually the most effective. Color images
with 256 or fewer colors should almost always be stored in
indexed-color
format;
truecolor
format is likely to be
much larger.
Filter type 0 is also recommended for images of bit depths less than 8. For low-bit-depth greyscale images, in rare cases,
better compression may be obtained by first expanding the image to 8-bit representation and then applying filtering.
For
truecolor
and
greyscale
images, any of the five filters may prove the most effective. If an encoder uses a
fixed filter, the Paeth filter type is most likely to be the best.
For best compression of
truecolor
and
greyscale
images,
and if compression efficiency is valued over speed of compression,
the recommended approach is adaptive filtering in which a
filter type is chosen for each scanline.
Each unique image will have a different set of filters which perform best for it.
An encoder could try every combination of filters to find what compresses best
for a given image. However, when an exhaustive search is unacceptable,
here are some general heuristics which may perform well enough:
compute the output
scanline using all five filters, and select the filter that gives the smallest sum of absolute values of outputs. (Consider
the output bytes as signed differences for this test.)
This method usually outperforms any single fixed filter type choice.
Filtering according to these recommendations is effective in conjunction with either of the two interlace methods defined
in this specification.
The encoder may divide the compressed datastream into
IDAT
chunks however it wishes.
(Multiple
IDAT
chunks are allowed so that encoders may work in a fixed amount of memory;
typically the chunk size will correspond to the encoder's buffer size.) A PNG datastream in which each
IDAT
chunk contains only one data byte is valid, though remarkably wasteful of space. (
Zero-length
IDAT
chunks are also valid, though even more wasteful.)
A nonempty keyword shall be provided for each text chunk. The generic keyword "Comment" can be used if no better
description of the text is available. If a user-supplied keyword is used, encoders should check that it meets the
restrictions on keywords.
The
iTXt
chunk uses the UTF-8 encoding of Unicode and thus can store text in any
language. The
tEXt
and
zTXt
chunks use the Latin-1
(ISO 8859-1) character encoding, which limits the range of characters that can be used in these chunks. Encoders should
prefer
iTXt
to
tEXt
and
zTXt
chunks, in order to allow a wide range of characters without data loss. Encoders must convert characters
that use local
legacy character encodings
to the appropriate encoding when storing text.
When creating
iTXt
chunks,
encoders should follow
UTF-8 encode
in
Encoding Standard
Encoders should discourage
the creation of single lines of text longer than 79 Unicode
code points
, in order to facilitate easy reading. It is
recommended that text items less than 1024 bytes in size should be output using uncompressed text chunks. It is recommended
that the basic title and author keywords be output using uncompressed text chunks. Placing large text chunks after the
image data
(after the
IDAT
chunks) can speed up image display in some situations,
as the decoder will decode the
image data
first. It is recommended that small text chunks, such as the image title,
appear before the
IDAT
chunks.
Encoders
MAY
use private chunks to carry information that need not be understood by other applications.
Encoders
MAY
use non-reserved field values for experimental or private use.
All ancillary chunks are optional, encoders need not write them. However, encoders are encouraged to write the standard
ancillary chunks when the information is available.
This clause gives some requirements and recommendations for PNG decoder behavior and viewer behavior. A viewer presents
the decoded PNG image to the user. Since viewer and decoder behavior are closely connected, decoders and viewers are treated
together here. The only absolute requirement on a PNG decoder is that it successfully reads any datastream conforming to the
format specified in the preceding chapters. However, best results will usually be achieved by following these additional
recommendations.
PNG decoders shall support all valid combinations of bit depth,
color type
, compression method,
filter
method
, and interlace method that are explicitly defined in this International Standard.
Errors in a PNG datastream will fall into two general classes, transmission errors and syntax errors (see
4.10
Error handling
).
Examples of transmission errors are transmission in "text" or "ascii" mode, in which byte codes 13 and/or 10 may be added,
removed, or converted throughout the datastream; unexpected termination, in which the datastream is truncated; or a physical
error on a storage device, in which one or more blocks (typically 512 bytes each) will have garbled or random values. Some
examples of syntax errors are an invalid value for a row filter, an invalid compression method, an invalid chunk length, the
absence of a
PLTE
chunk before the first
IDAT
chunk
in an indexed image, or the presence of multiple
gAMA
chunks. A PNG decoder should handle
errors as follows:
Detect errors as early as possible using the PNG signature bytes and CRCs on each chunk. Decoders should verify
that all eight bytes of the PNG signature are correct. A decoder can have additional confidence in the datastream's
integrity if the next eight bytes begin an
IHDR
chunk with the correct chunk length. A
CRC
should be checked before processing the chunk data. Sometimes this is impractical, for example when a streaming
PNG decoder is processing a large
IDAT
chunk. In this case the
CRC
should be
checked when the end of the chunk is reached.
Recover from an error, if possible; otherwise fail gracefully. Errors that have little or no effect on the processing
of the image may be ignored, while those that affect critical data shall be dealt with in a manner appropriate to the
application.
Provide helpful messages describing errors, including recoverable errors.
Three classes of PNG chunks are relevant to this philosophy. For the purposes of this classification, an "unknown chunk"
is either one whose type was genuinely unknown to the decoder's author, or one that the author chose to treat as unknown,
because default handling of that chunk type would be sufficient for the program's purposes. Other chunks are called "known
chunks". Given this definition, the three classes are as follows:
known chunks, which necessarily includes all of the critical chunks defined in this specification (
IHDR
PLTE
IDAT
IEND
unknown critical chunks (bit 5 of the first byte of the chunk type is 0)
unknown ancillary chunks (bit 5 of the first byte of the chunk type is 1)
See
5.4
Chunk naming conventions
for a description of chunk naming conventions.
PNG chunk types are marked "critical" or "ancillary" according to whether the chunks are critical for the purpose of
extracting a viewable image (as with
IHDR
PLTE
, and
IDAT
) or critical to understanding the datastream structure (as with
IEND
). This is a specific kind of criticality and one that is not necessarily relevant to every
conceivable decoder. For example, a program whose sole purpose is to extract text annotations (for example, copyright
information) does not require a viewable image
but should
decode UTF-8 correctly
Another decoder might consider the
tRNS
and
gAMA
chunks essential to its proper execution.
Syntax errors always involve known chunks because syntax errors in unknown chunks cannot be detected. The PNG decoder has
to determine whether a syntax error is fatal (unrecoverable) or not, depending on its requirements and the situation. For
example, most decoders can ignore an invalid
IEND
chunk; a text-extraction program can
ignore the absence of
IDAT
; an image viewer cannot recover from an empty
PLTE
chunk in an indexed image but it can ignore an invalid
PLTE
chunk
in a
truecolor
image; and a program that extracts the alpha channel can ignore an invalid
gAMA
chunk, but may consider the presence of two
tRNS
chunks to be a fatal
error. Anomalous situations other than syntax errors shall be treated as follows:
Encountering an unknown ancillary chunk is never an error. The chunk can simply be ignored.
Encountering an unknown critical chunk is a fatal condition for any decoder trying to extract the image from the
datastream. A decoder that ignored a critical chunk could not know whether the image it extracted was the one intended by
the encoder.
A PNG signature mismatch, a
CRC
mismatch, or an unexpected end-of-stream indicates a corrupted datastream, and
may be regarded as a fatal error. A decoder could try to salvage something from the datastream, but the extent of the
damage will not be known.
When a fatal condition occurs, the decoder should fail immediately, signal an error to the user if appropriate, and
optionally continue displaying any
image data
already visible to the user (i.e. "fail gracefully"). The application as
a whole need not terminate.
When a non-fatal error occurs, the decoder should signal a warning to the user if appropriate, recover from the error, and
continue processing normally.
When decoding an indexed-color PNG, if out-of-range indexes are encountered, decoders have historically varied in their
handling of this error.
Displaying the pixel as opaque black
is one common error recovery tactic,
and is now required by this specification.
Older implementations will vary, and so the behavior
must not be relied on by encoders.
Decoders that do not compute CRCs should interpret apparent syntax errors as indications of corruption (see also
13.2
Error checking
).
Errors in compressed chunks (
IDAT
zTXt
iTXt
iCCP
) could lead to buffer overruns.
Implementors of
deflate
decompressors should guard against this possibility.
APNG
is designed to allow incremental display of frames before the entire
datastream
has been
read. This implies that some errors may not be detected until partway through the animation. It is strongly recommended that
when any error is encountered decoders should discard all subsequent frames, stop the animation, and revert to displaying the
static image. A decoder which detects an error before the animation has started should display the static image. An error
message may be displayed to the user if appropriate.
Decoders shall treat out-of-order
APNG
chunks as an error.
APNG
-aware
PNG editors
should restore them to correct
order, using the sequence numbers.
The PNG error handling philosophy is described in
13.1
Error handling
An unknown chunk type is
not
to be treated as an error unless it is a critical chunk.
The chunk type can be checked for plausibility by seeing whether all four bytes are in the range codes 41-5A and 61-7A
(hexadecimal); note that this need be done only for unrecognized chunk types. If the total datastream size is known (from
file system information, HTTP protocol, etc), the chunk length can be checked for plausibility as well. If CRCs are
not checked, dropped/added data bytes or an erroneous chunk length can cause the decoder to get out of step and misinterpret
subsequent data as a chunk header.
For known-length chunks, such as
IHDR
, decoders should treat an unexpected chunk
length as an error. Future extensions to this specification will not add new fields to existing chunks; instead, new chunk
types will be added to carry new information.
Unexpected values in fields of known chunks (for example, an unexpected compression method in the
IHDR
chunk) shall be checked for and treated as errors. However, it is recommended that unexpected field values
be treated as fatal errors only in
critical
chunks. An unexpected value in an ancillary chunk can be handled
by ignoring the whole chunk as though it were an unknown chunk type. (This recommendation assumes that the chunk's
CRC
has been verified. In decoders that do not check CRCs, it is safer to treat any unexpected value as indicating a
corrupted datastream.)
Standard PNG images shall be compressed with compression method 0. The compression method field of the
IHDR
chunk is provided for possible future standardization or proprietary variants. Decoders shall check
this byte and report an error if it holds an unrecognized code. See
10.
Compression
for details.
A PNG datastream is composed of a collection of explicitly typed chunks.
Chunks whose contents are defined by the
specification could actually contain anything, including malicious code.
Similarly there could be data after the
IEND
chunk
which could contain anything, including malicious code.
There is no known risk that such malicious code
could be executed on the recipient's computer
as a result of decoding the PNG image
However, a malicious application might hide such code inside an innocent-looking image file
and then execute it.
The possible security risks associated with future chunk types cannot be specified at this time. Security issues will be
considered when defining future public chunks. There is no additional security risk associated with unknown or unimplemented
chunk types, because such chunks will be ignored, or at most be copied into another PNG datastream.
The
iTXt
tEXt
, and
zTXt
chunks contain keywords and data that are meant to be displayed as plain text. The
iCCP
and
sPLT
chunks contain keywords that are meant to be displayed as
plain text. It is possible that if the decoder displays such text without filtering out control characters, especially the
ESC (escape) character, certain systems or terminals could behave in undesirable and insecure ways. It is recommended that
decoders filter out control characters to avoid this risk; see
13.7
Text chunk processing
For the
eXIf
chunk, the Exif Specification [
CIPA-DC-008
] does not contain an express
requirement that tag "value offset" pointers must actually point to a valid address within the file.
This requirement is merely implied. (See Paragraph 4.6.2, which describes the Exif IFD structure.)
Regardless, decoders should be prepared to encounter invalid pointers and to handle them appropriately.
Every chunk begins with a length field, which makes it easier to write decoders that are invulnerable to fraudulent chunks
that attempt to overflow buffers. The
CRC
at the end of every chunk provides a robust defence against accidentally
corrupted data. The PNG signature bytes provide early detection of common file transmission errors.
A decoder that fails to check CRCs could be subject to data corruption. The only likely consequence of such
corruption is incorrectly displayed pixels within the image. Worse things might happen if the
CRC
of the
IHDR
chunk is not checked and the width or height fields are corrupted. See
13.2
Error checking
A poorly written decoder might be subject to buffer overflow, because chunks can be extremely large, up to
31
-1 bytes long. But properly written decoders will handle large chunks without difficulty.
Some image editing tools have historically performed redaction
by merely setting the alpha channel of the redacted area to zero,
without also removing the actual image data.
Users who rely solely on the visual appearance of such images
run a privacy risk
because the actual image data can be easily recovered.
Similarly, some image editing tools have historically performed clipping
by rewriting the width and height in
IHDR
without re-encoding the image data,
which thus extends beyond the new width and height and may be recovered.
Images with
eXIf
chunks
may contain automatically-included data,
such as photographic GPS coordinates,
which could be a privacy risk if the user is unaware that the PNG image contains this data.
(Other image formats that contain EXIF, such as JPEG/JFIF, have the same privacy risk).
Decoders shall recognize chunk types by a simple four-byte literal comparison; it is incorrect to perform case conversion
on chunk types. A decoder encountering an unknown chunk in which the ancillary bit is 1 may safely ignore the chunk and
proceed to display the image. A decoder trying to extract the image, upon encountering an unknown chunk in which the
ancillary bit is 0, indicating a critical chunk, shall indicate to the user that the image contains information it cannot
safely interpret.
Decoders should test the properties of an unknown chunk type by numerically testing the specified bits. Testing whether a
character is uppercase or lowercase is inefficient, and even incorrect if a locale-specific case definition is used.
Decoders should not flag an error if the reserved bit is set to 1, however, as some future version of the PNG
specification could define a meaning for this bit. It is sufficient to treat a chunk with this bit set in the same way as any
other unknown chunk type.
Decoders do not need to test the chunk type private bit, since it has no functional significance and is used to avoid
conflicts between chunks defined by
W3C
and those defined privately.
All ancillary chunks are optional; decoders may ignore them. However, decoders are encouraged to interpret these chunks
when appropriate and feasible.
Non-square pixels can be represented (see
11.3.4.3
pHYs
Physical pixel dimensions
), but viewers are not required to account for them; a
viewer can present any PNG datastream as though its pixels are square.
Where the pixel aspect ratio of the display differs from the aspect ratio of the physical pixel dimensions defined in the
PNG datastream, viewers are strongly encouraged to rescale images for proper display.
When the
pHYs
chunk has a unit specifier of 0 (unit is unknown), the behavior of a
decoder may depend on the ratio of the two pixels-per-unit values, but should not depend on their magnitudes. For example, a
pHYs
chunk containing
(ppuX, ppuY, unit) = (2, 1, 0)
is equivalent to one
containing
(1000, 500, 0)
; both are equally valid indications that the image pixels are twice as tall as they
are wide.
One reasonable way for viewers to handle a difference between the pixel aspect ratios of the image and the display is to
expand the image either horizontally or vertically, but not both. The scale factors could be obtained using the following
floating-point calculations:
image_ratio = pHYs_ppuY / pHYs_ppuX
display_ratio = display_ppuY / display_ppuX
scale_factor_X = max(1.0, image_ratio/display_ratio)
scale_factor_Y = max(1.0, display_ratio/image_ratio)
Because other methods such as maintaining the image area are also reasonable, and because ignoring the
pHYs
chunk is permissible, authors should not assume that all viewing applications will use this scaling
method.
As well as making corrections for pixel aspect ratio, a viewer may have reasons to perform additional scaling both
horizontally and vertically. For example, a viewer might want to shrink an image that is too large to fit on the display, or
to expand images sent to a high-resolution printer so that they appear the same size as they did on the display.
If practical, PNG decoders should have a way to display to the user all the
iTXt
tEXt
, and
zTXt
chunks found in the datastream. Even
if the decoder does not recognize a particular text keyword, the user might be able to understand it.
When processing
tEXt
and
zTXt
chunks, decoders
could encounter characters other than those permitted. Some can be safely displayed (e.g., TAB, FF, and CR, hexadecimal 09,
0C, and 0D, respectively), but others, especially the ESC character (hexadecimal 1B), could pose a security hazard (because
unexpected actions may be taken by display hardware or software). Decoders should not attempt to directly display any
non-Latin-1 characters (except for newline and perhaps TAB, FF, CR) encountered in a
tEXt
or
zTXt
chunk. Instead, they should be ignored or displayed in a visible notation such as
\nnn
". See
13.3
Security considerations
When processing
iTXt
chunks,
decoders should follow
UTF-8 decode
in
Encoding Standard
Even though encoders are recommended to represent newlines as linefeed (hexadecimal 0A), it is recommended that decoders
not rely on this; it is best to recognize all the common newline combinations (CR, LF, and CR-LF) and display each as a
single newline. TAB can be expanded to the proper number of spaces needed to arrive at a column multiple of 8.
Decoders running on systems with a non-Latin-1
legacy character encoding
should remap character codes so that
Latin-1 characters are displayed correctly. Unsupported characters should be replaced with a system-appropriate replacement
character (such as U+FFFD REPLACEMENT CHARACTER, U+003F QUESTION MARK, or U+001A SUB) or mapped to a visible notation such as
\nnn
". Characters should be only displayed if they are printable characters on the decoding system. Some byte
values may be interpreted by the decoding system as control characters; for security, decoders running on such systems should
not display these control characters.
Decoders should be prepared to display text chunks that contain any number of printing characters between newline
characters, even though it is recommended that encoders avoid creating lines in excess of 79 characters.
The compression technique used in this specification does not require the entire compressed datastream to be available
before decompression can start. Display can therefore commence before the entire decompressed datastream is available. It is
extremely unlikely that any general purpose compression methods in future versions of this specification will not have this
property.
It is important to emphasize that
IDAT
chunk boundaries have no semantic significance
and can occur at any point in the compressed datastream. There is no required correlation between the structure of the
image data
(for example, scanline boundaries) and
deflate
block boundaries or
IDAT
chunk boundaries. The complete
image data
is represented by a single
zlib
datastream that is
stored in some number of
IDAT
chunks; a decoder that assumes any more than this is
incorrect. Some encoder implementations may emit datastreams in which some of these structures are indeed related, but
decoders cannot rely on this.
To reverse the effect of a filter, the decoder may need to use the decoded values of the prior pixel on the same line, the
pixel immediately above the current pixel on the prior line, and the pixel just to the left of the pixel above. This implies
that at least one scanline's worth of
image data
needs to be stored by the decoder at all times. Even though some
filter types do not refer to the prior scanline, the decoder will always need to store each scanline as it is decoded, since
the next scanline might use a filter type that refers to it. See
7.3
Filtering
Decoders are required to be able to read interlaced images. If the reference image contains fewer than five columns or
fewer than five rows, some passes will be empty. Encoders and decoders shall handle this case correctly. In particular,
filter type bytes are associated only with nonempty scanlines; no filter type bytes are present in an empty reduced
image.
When receiving images over slow transmission links, viewers can improve perceived performance by displaying interlaced
images progressively. This means that as each reduced image is received, an approximation to the complete image is displayed
based on the data received so far. One simple yet pleasing effect can be obtained by expanding each received pixel to fill a
rectangle covering the yet-to-be-transmitted pixel positions below and to the right of the received pixel. This process can
be described by the following ISO C code [
ISO_9899
]:
int starting_row[
] = {
};
int starting_col[
] = {
};
int row_increment[
] = {
};
int col_increment[
] = {
};
int block_height[
] = {
};
int block_width[
] = {
};

int pass;
long row, col;

pass =
while
(pass <
row = starting_row[pass];
while
(row < height)
col = starting_col[pass];
while
(col < width)
visit
(row, col,
min
(block_height[pass], height - row),
min
(block_width[pass], width - col));
col = col + col_increment[pass];
row = row + row_increment[pass];
pass = pass +
The function
visit(row,column,height,width)
obtains the next transmitted pixel and paints a rectangle of the
specified height and width, whose upper-left corner is at the specified row and column, using the color indicated by the
pixel. Note that row and column are measured from 0,0 at the upper left corner.
If the viewer is merging the received image with a background image, it may be more convenient just to paint the received
pixel positions (the
visit()
function sets only the pixel at the specified row and column, not the whole
rectangle). This produces a "fade-in" effect as the new image gradually replaces the old. An advantage of this approach is
that proper alpha or transparency processing can be done as each pixel is replaced. Painting a rectangle as described above
will overwrite background-image pixels that may be needed later, if the pixels eventually received for those positions turn
out to be wholly or partially transparent. This is a problem only if the background image is not stored anywhere
offscreen.
To achieve PNG's goal of universal interchangeability, decoders shall accept all types of PNG image:
indexed-color
truecolor
, and
greyscale
. Viewers running on indexed-color display hardware need to be able to reduce
truecolor
images to indexed-color for viewing. This process is called "color quantization".
A simple, fast method for color quantization is to reduce the image to a fixed palette. Palettes with uniform color
spacing ("color cubes") are usually used to minimize the per-pixel computation. For photograph-like images, dithering is
recommended to avoid ugly contours in what should be smooth gradients; however, dithering introduces graininess that can be
objectionable.
The quality of rendering can be improved substantially by using a palette chosen specifically for the image, since a
color cube usually has numerous entries that are unused in any particular image. This approach requires more work, first in
choosing the palette, and second in mapping individual pixels to the closest available color. PNG allows the encoder to
supply suggested palettes, but not all encoders will do so, and the suggested palettes may be unsuitable in any case (they
may have too many or too few colors). Therefore, high-quality viewers will need to have a palette selection routine at hand.
A large lookup table is usually the most feasible way of mapping individual pixels to palette entries with adequate
speed.
Numerous implementations of color quantization are available. The PNG sample implementation, libpng (
), includes code for
the purpose.
Decoders may wish to scale PNG data to a lesser sample depth (data precision) for display. For example, 16-bit data will
need to be reduced to 8-bit depth for use on most present-day display hardware. Reduction of 8-bit data to 5-bit depth is
also common.
The most accurate scaling is achieved by the linear equation
output = floor((input * MAXOUTSAMPLE / MAXINSAMPLE) + 0.5)
where
MAXINSAMPLE = (2
sampledepth
)-1
MAXOUTSAMPLE = (2
desired_sampledepth
)-1
A slightly less accurate conversion is achieved by simply shifting right by
(sampledepth -
desired_sampledepth)
places. For example, to reduce 16-bit samples to 8-bit, the low-order byte can be discarded. In
many situations the shift method is sufficiently accurate for display purposes, and it is certainly much faster. (But if
gamma
correction is being done, sample rescaling can be merged into the
gamma
correction lookup table, as is
illustrated in
13.13
Decoder gamma handling
.)
If the decoder needs to scale samples up (for example, if the
frame buffer
has a greater sample depth than the PNG
image), it should use linear scaling or left-bit-replication as described in
12.4
Sample depth scaling
When an
sBIT
chunk is present, the reference
image data
can be recovered by
shifting right to the sample depth specified by
sBIT
. Note that linear scaling will not
necessarily reproduce the original data, because the encoder is not required to have used linear scaling to scale the data
up. However, the encoder is required to have used a method that preserves the high-order bits, so shifting always works. This
is the only case in which shifting might be said to be more accurate than linear scaling. A decoder need not pay attention to
the
sBIT
chunk; the stored image is a valid PNG datastream of the sample depth indicated
by the
IHDR
chunk; however, using
sBIT
to recover the
original samples before scaling them to suit the display often yields a more accurate display than ignoring
sBIT
When comparing pixel values to
tRNS
chunk values to detect transparent pixels, the
comparison shall be done exactly. Therefore, transparent pixel detection shall be done before reducing sample precision.
See
C.
Gamma and chromaticity
for a brief introduction to
gamma
issues.
Viewers capable of full color management will perform more sophisticated calculations than those described here.
For an image display program to produce correct tone reproduction, it is necessary to take into account the relationship
between samples and display output, and the
transfer function
of the display system. This can be done by
calculating:
sample = integer_sample / (2
sampledepth
- 1.0)
display_output = sample
1.0/gamma
display_input = inverse_display_transfer(display_output)
framebuf_sample = floor((display_input * MAX_FRAMEBUF_SAMPLE)+0.5)
where
integer_sample
is the sample value from the datastream,
framebuf_sample
is the value to write
into the
frame buffer
, and
MAX_FRAMEBUF_SAMPLE
is the maximum value of a
frame buffer
sample (255
for 8-bit, 31 for 5-bit, etc). The first line converts an integer sample into a normalized floating point value (in the range
0.0 to 1.0), the second converts to a value proportional to the desired display output intensity, the third accounts for the
display system's
transfer function
, and the fourth converts to an integer
frame buffer
sample. Zero raised to
any positive power is zero.
A step could be inserted between the second and third to adjust
display_output
to account for the difference
between the actual viewing conditions and the reference viewing conditions. However, this adjustment requires accounting for
veiling glare, black mapping, and color appearance models, none of which can be well approximated by power functions. Such
calculations are not described here. If viewing conditions are ignored, the error will usually be small.
The display
transfer function
can typically be approximated by a power function with exponent
display_exponent
, in which case the second and third lines can be merged into:
display_input = sample
1.0/(gamma * display_exponent)
= sample
decoding_exponent
so as to perform only one power calculation. For color images, the entire calculation is performed separately for R, G,
and B values.
The
gamma value
can be taken directly from the
gAMA
chunk. Alternatively, an
application may wish to allow the user to adjust the appearance of the displayed image by influencing the
gamma value
For example, the user could manually set a parameter
user_exponent
which defaults to 1.0, and the application
could set:
gamma
gamma_from_file / user_exponent
decoding_exponent
1.0
/ (gamma * display_exponent)
user_exponent / (gamma_from_file * display_exponent)
The user would set
user_exponent
greater than 1 to darken the mid-level tones, or less than 1 to lighten
them.
gAMA
chunk containing zero is meaningless but could appear by mistake. Decoders
should ignore it, and editors may discard it and issue a warning to the user.
It is
not
necessary to perform a transcendental mathematical computation for every pixel. Instead, a
lookup table can be computed that gives the correct output value for every possible sample value. This requires only 256
calculations per image (for 8-bit accuracy), not one or three calculations per pixel. For an indexed-color image, a one-time
correction of the palette is sufficient, unless the image uses transparency and is being displayed against a nonuniform
background.
If floating-point calculations are not possible,
gamma
correction tables can be computed using integer arithmetic
and a precomputed table of logarithms. Example code appears in [
PNG-EXTENSIONS
].
When the incoming image has unknown
gamma value
gAMA
sRGB
, and
iCCP
all absent), standalone image viewers
should choose a likely default
gamma value
, but allow the user to select a new one if the result proves too dark or
too light. The default
gamma value
may depend on other knowledge about the image, for example whether it came from the
Internet or from the local system. For consistency, viewers for document formats such as HTML, or vector graphics such as
SVG, should treat embedded or linked PNG images with unknown
gamma value
in the same way that they treat other
untagged images.
In practice, it is often difficult to determine what value of display exponent should be used. In systems with no built-in
gamma
correction, the display exponent is determined entirely by the
CRT
. A display exponent of 2.2 should be
used unless detailed calibration measurements are available for the particular
CRT
used.
Many modern
frame buffers
have lookup tables that are used to perform
gamma
correction, and on these systems
the display exponent value should be the exponent of the lookup table and
CRT
combined. It may not be possible to find
out what the lookup table contains from within the viewer application, in which case it may be necessary to ask the user to
supply the display system's exponent value. Unfortunately, different manufacturers use different ways of specifying what
should go into the lookup table, so interpretation of the system
gamma value
is system-dependent.
The response of real displays is actually more complex than can be described by a single number (the display exponent). If
actual measurements of the monitor's light output as a function of voltage input are available, the third and fourth lines of
the computation above can be replaced by a lookup in these measurements, to find the actual
frame buffer
value that
most nearly gives the desired brightness.
See
C.
Gamma and chromaticity
for references to color issues.
In many cases, the
image data
in PNG datastreams will be treated as device-dependent RGB values and displayed
without modification (except for appropriate
gamma
correction). This provides the fastest display of PNG images. But
unless the viewer uses exactly the same display hardware as that used by the author of the original image, the colors will
not be exactly the same as those seen by the original author, particularly for darker or near-neutral colors. The
cHRM
chunk provides information that allows closer color matching than that provided by
gamma
correction alone.
The
cHRM
data can be used to transform the
image data
from RGB to XYZ and
thence into a perceptually linear color space such as CIE LAB. The colors can be partitioned to generate an optimal
palette, because the geometric distance between two colors in CIE LAB is strongly related to how different those colors
appear (unlike, for example, RGB or XYZ spaces). The resulting palette of colors, once transformed back into RGB color
space, could be used for display or written into a
PLTE
chunk.
Decoders that are part of image processing applications might also transform
image data
into CIE LAB space for
analysis.
In applications where color fidelity is critical, such as product design, scientific visualization, medicine,
architecture, or advertising, PNG decoders can transform the
image data
from source RGB to the display RGB space of
the monitor used to view the image. This involves calculating the matrix to go from source RGB to XYZ and the matrix to go
from XYZ to display RGB, then combining them to produce the overall transformation. The PNG decoder is responsible for
implementing gamut mapping.
Decoders running on platforms that have a Color Management System (CMS) can pass the
image data
gAMA
, and
cHRM
values to the CMS for display or further
processing.
PNG decoders that provide color printing facilities can use the facilities in Level 2 PostScript to specify
image
data
in calibrated RGB space or in a device-independent color space such as XYZ. This will provide better color
fidelity than a simple RGB to CMYK conversion. The PostScript Language Reference manual [
PostScript
] gives examples. Such
decoders are responsible for implementing gamut mapping between source RGB (specified in the
cHRM
chunk) and the target printer. The PostScript interpreter is then responsible for producing the required
colors.
PNG decoders can use the
cHRM
data to calculate an accurate greyscale representation
of a color image. Conversion from RGB to grey is simply a case of calculating the Y (luminance) component of XYZ, which is a
weighted sum of R, G, and B values. The weights depend upon the monitor type, i.e. the values in the
cHRM
chunk. PNG decoders may wish to do this for PNG datastreams with no
cHRM
chunk. In this case, a reasonable default would be the CCIR 709 primaries [
ITU-R-BT.709
]. The original
NTSC primaries should
not
be used unless the PNG image really was color-balanced for such a monitor.
The background color given by the
bKGD
chunk will typically be used to fill unused
screen space around the image, as well as any transparent pixels within the image. (Thus,
bKGD
is valid and useful even when the image does not use transparency.) If no
bKGD
chunk is present, the viewer will need to decide upon a suitable background color. When no other
information is available, a medium grey such as 153 in the 8-bit sRGB color space would be a reasonable choice. Transparent
black or white text and dark drop shadows, which are common, would all be legible against this background.
Viewers that have a specific background against which to present the image (such as web browsers) should ignore the
bKGD
chunk, in effect overriding
bKGD
with their
preferred background color or background image.
The background color given by the
bKGD
chunk is not to be considered transparent,
even if it happens to match the color given by the
tRNS
chunk (or, in the case of an
indexed-color
image, refers to a palette index that is marked as transparent by the
tRNS
chunk). Otherwise one would have to imagine something "behind the background" to
composite
against. The background
color is either used as background or ignored; it is not an intermediate layer between the PNG image and some other
background.
Indeed, it will be common that the
bKGD
and
tRNS
chunks specify the same color, since then a decoder that does not implement transparency processing will give the intended
display, at least when no partially-transparent pixels are present.
The alpha channel can be used to
composite
a foreground image against a background image. The PNG datastream
defines the foreground image and the transparency mask, but not the background image. PNG decoders are
not
required to support this most general case. It is expected that most will be able to support compositing against a single
background color.
The equation for computing a
composited
sample value is:
output = alpha * foreground + (1-alpha) * background
where alpha and the input and output sample values are expressed as fractions in the range 0 to 1. This computation should
be performed with intensity samples (not
gamma
-encoded samples). For color images, the computation is done separately
for R, G, and B samples.
The following code illustrates the general case of compositing a foreground image against a background image. It assumes
that the original pixel data are available for the background image, and that output is to a
frame buffer
for display.
Other variants are possible; see the comments below the code. The code allows the sample depths and
gamma values
of
foreground image and background image all to be different and not necessarily suited to the display system. In practice no
assumptions about equality should be made without first checking.
This code is ISO C [
ISO_9899
], with line numbers added for reference in the comments below.
01
int foreground[
];
02
int background[
];
03
int fbpix[
];
04
int fg_maxsample;
05
int bg_maxsample;
06
int fb_maxsample;
07
int ialpha;
08
float alpha, compalpha;
09
float gamfg, linfg, gambg, linbg, comppix, gcvideo;
10
fg_maxsample = (
<< fg_sample_depth) -
11
bg_maxsample = (
<< bg_sample_depth) -
12
fb_maxsample = (
<< frame_buffer_sample_depth) -
13
ialpha = foreground[
];
14
if
(ialpha ==
) {
15
16
else
if
(ialpha == fg_maxsample) {
17
for
(i =
; i <
; i++) {
18
gamfg = (float) foreground[i] / fg_maxsample;
19
linfg =
pow
(gamfg,
1.0
/ fg_gamma);
20
comppix = linfg;
21
gcvideo =
pow
(comppix,
1.0
/ display_exponent);
22
fbpix[i] = (int) (gcvideo * fb_maxsample +
0.5
);
23
24
else
25
alpha = (float) ialpha / fg_maxsample;
26
compalpha =
1.0
- alpha;
27
for
(i =
; i <
; i++) {
28
gamfg = (float) foreground[i] / fg_maxsample;
29
linfg =
pow
(gamfg,
1.0
/ fg_gamma);
30
gambg = (float) background[i] / bg_maxsample;
31
linbg =
pow
(gambg,
1.0
/ bg_gamma);
32
comppix = linfg * alpha + linbg * compalpha;
33
gcvideo =
pow
(comppix,
1.0
/ display_exponent);
34
fbpix[i] = (int) (gcvideo * fb_maxsample +
0.5
);
35
36
Variations:
If output is to another PNG datastream instead of a
frame buffer
, lines 21, 22, 33, and 34 should be changed
along the following lines
gamout =
pow
(comppix, outfile_gamma);
outpix[i] = (int) (gamout * out_maxsample +
0.5
);
Also, it becomes necessary to process background pixels when alpha is zero, rather than just skipping pixels. Thus, line 15
will need to be replaced by copies of lines 17-23, but processing background instead of foreground pixel values.
If the sample depths of the output file, foreground file, and background file are all the same, and the three
gamma
values
also match, then the no-compositing code in lines 14-23 reduces to copying pixel values from the input file to
the output file if alpha is one, or copying pixel values from background to output file if alpha is zero. Since alpha is
typically either zero or one for the vast majority of pixels in an image, this is a significant saving. No
gamma
computations are needed for most pixels.
When the sample depths and
gamma values
all match, it may appear attractive to skip the
gamma
decoding
and encoding (lines 28-31, 33-34) and just perform line 32 using
gamma
-encoded sample values. Although this does not
have too bad an effect on image quality, the time savings are small if alpha values of zero and one are treated as special
cases as recommended here.
If the original pixel values of the background image are no longer available, only processed
frame buffer
pixels
left by display of the background image, then lines 30 and 31 need to extract intensity from the
frame buffer
pixel
values using code such as
gcvideo = (float) fbpix[i] / fb_maxsample;
linbg =
pow
(gcvideo, display_exponent);
However, some roundoff error can result, so it is better to have the original background pixels available if at all possible.
Note that lines 18-22 are performing exactly the same
gamma
computation that is done when no alpha channel is
present. If the no-alpha case is handled with a lookup table, the same lookup table can be used here. Lines 28-31 and 33-34
can also be done with (different) lookup tables.
Integer arithmetic can be used instead of floating point, providing care is taken to maintain sufficient precision
throughout.
Note
NOTE In floating point, no overflow or underflow checks are needed, because the input sample values are
guaranteed to be between 0 and 1, and compositing always yields a result that is in between the input values (inclusive).
With integer arithmetic, some roundoff-error analysis might be needed to guarantee no overflow or underflow.
When displaying a PNG image with full alpha channel, it is important to be able to
composite
the image against some
background, even if it is only black. Ignoring the alpha channel will cause PNG images that have been converted from an
associated-alpha representation to look wrong. (Of course, if the alpha channel is a separate transparency mask, then
ignoring alpha is a useful option: it allows the hidden parts of the image to be recovered.)
Even if the decoder does not implement true compositing logic, it is simple to deal with images that contain only zero and
one alpha values. (This is implicitly true for
greyscale
and
truecolor
PNG datastreams that use a
tRNS
chunk; for
indexed-color
PNG datastreams it is easy to check whether the
tRNS
chunk contains any values other than 0 and 255.) In this simple case, transparent pixels are replaced by
the background color, while others are unchanged.
If a decoder contains only this much transparency capability, it should deal with a full alpha channel by treating all
nonzero alpha values as fully opaque or by dithering. Neither approach will yield very good results for images converted from
associated-alpha formats, but this is preferable to doing nothing. Dithering full alpha to binary alpha is very much like
dithering greyscale to black-and-white, except that all fully transparent and fully opaque pixels should be left unchanged by
the dither.
For viewers running on indexed-color hardware attempting to display a
truecolor
image, or an indexed-color image
whose palette is too large for the
frame buffer
, the encoder may have provided one or more suggested palettes in
sPLT
chunks. If one of these is found to be suitable, based on size and perhaps name, the
PNG decoder can use that palette. Suggested palettes with a sample depth different from what the decoder needs can be
converted using sample depth rescaling (see
13.12
Sample depth rescaling
).
When the background is a solid color, the viewer should
composite
the image and the suggested palette against that
color, then quantize the resulting image to the resulting RGB palette. When the image uses transparency and the background
is not a solid color, no suggested palette is likely to be useful.
For
truecolor
images, a suggested palette might also be provided in a
PLTE
chunk. If the image has a
tRNS
chunk and the background is a solid color, the viewer
will need to adapt the suggested palette for use with its desired background color. To do this, the palette entry closest to
the
tRNS
color should be replaced with the desired background color; or alternatively a
palette entry for the background color can be added, if the viewer can handle more colors than there are
PLTE
entries.
For images of
color type
6 (
truecolor with alpha
), any
PLTE
chunk
should have been designed for display of the image against a uniform background of the color specified by the
bKGD
chunk. Viewers should probably ignore the palette if they intend to use a different
background, or if the
bKGD
chunk is missing. Viewers can use a suggested palette for
display against a different background than it was intended for, but the results may not be very good.
If the viewer presents a transparent
truecolor
image against a background that is more complex than a uniform
color, it is unlikely that the suggested palette will be optimal for the
composite
image. In this case it is best to
perform a
truecolor
compositing step on the
truecolor
PNG image and background image, then color-quantize
the resulting image.
In
truecolor
PNG datastreams, if both
PLTE
and
sPLT
chunks appear, the PNG decoder may choose from among the palettes suggested by both, bearing in mind the
different transparency semantics described above.
The frequencies in the
sPLT
and
hIST
chunks are
useful when the viewer cannot provide as many colors as are used in the palette in the PNG datastream. If the viewer has a
shortfall of only a few colors, it is usually adequate to drop the least-used colors from the palette. To reduce the number
of colors substantially, it is best to choose entirely new representative colors, rather than trying to use a subset of the
existing palette. This amounts to performing a new color quantization step; however, the existing palette and histogram can
be used as the input data, thus avoiding a scan of the
image data
in the
IDAT
chunks.
If no suggested palette is provided, a decoder can develop its own, at the cost of an extra pass over the
image
data
in the
IDAT
chunks. Alternatively, a default palette (probably a color cube)
can be used.
See also
12.5
Suggested palettes

Portable Network Graphics (PNG) Specification (Third Edition)