Intro to PNG Features

Intro to PNG Features
A Basic Introduction to PNG Features
This page is intended to provide an explanation of some of the features of
the PNG format for non-technical users. As such, it doesn't emphasize PNG
features like freedom from patents; those are more of concern to developers.
Where programmer information is given, it is principally to explain to the
user
why various applications may not perform as well as expected.
Where performance claims are made--especially compression comparisons with
other image formats--we assume that the PNG implementation is at least as
good as the best freeware encoders. Note that this is currently
not
necessarily a valid assumption in the case of a number of popular (and
expensive) image editors, but it's not always clear where the problem lies.
Please
let Greg know
if parts of this page still don't make sense or if there are other PNG
features and/or foibles that aren't covered here. Greg would like this
to be a friendly and usable resource for non-experts.
A Russian translation of this page (with some additional information) is
available here:
(Thanks to Ivan Zenkov for the translation!)
Finally, there are a number of third-party pages that provide different
and complementary perspectives on PNG:
Kerry Watson
has created a wonderfully easy-to-read,
multi-page intro
covering compression, transparency, interlacing, a selection of
Windows
software
, and comparisons to other formats as part of his
Web Colors
site.
Vincent Sabio
wrote a nice
PNG summary and review
that is more detailed than this page but not as detailed as the
full
PNG specification
Stephan T. Lavavej
has
created an excellent
PNG introduction page
that
includes a lovely demo of
interpolated display
of interlaced PNGs (i.e., smooth and fuzzy
rather than blocky).
Drake Emko
, co-author of the
Hackles
web comic, has written an informative article entitled
PNG Tips for
Cartoonists
. (See also the
list
of PNG comics
.)
Tony "CraniumAbuse" Turner
has written up a nice format-comparison
introduction as part of his
How To Optimize a PNG Image File Using Paint Shop Pro
tutorial.
Typical Usage
The Portable Network Graphics (PNG) format was designed to replace the older
and simpler GIF format and, to some extent, the much more complex TIFF format.
(See the
main page
or the
history page
for background information.) Here we'll concentrate
on two major uses: the World Wide Web (WWW) and image-editing.
For the Web, PNG really has three main advantages over GIF: alpha channels
(variable transparency), gamma correction (cross-platform control of image
brightness), and two-dimensional interlacing (a method of progressive display).
PNG also compresses better than GIF in almost every case, but the difference
is generally only around 5% to 25%, not a large enough factor to encourage
folks to switch on that basis alone. One GIF feature that PNG does
not
try to reproduce is multiple-image support, especially animations; PNG was
and is intended to be a single-image format only. (A very PNG-like extension
format called
MNG
was finalized
in mid-1999 and is beginning to be supported by various
applications
, but MNGs and
PNGs will have different file extensions and different purposes.)
For image editing, either professional or otherwise, PNG provides a useful
format for the storage of intermediate stages of editing. Since PNG's
compression is fully lossless--and since it supports up to 48-bit truecolor
or 16-bit grayscale--saving, restoring and re-saving an image will not degrade
its quality, unlike standard JPEG (even at its highest quality settings).
And unlike TIFF, the
PNG specification
leaves no room for
implementors to pick and choose what features they'll support; the result is
that a PNG image saved in one app is readable in any other
PNG-supporting application
(Note that for transmission of finished truecolor images--especially
photographic ones--JPEG is almost always a better choice. Although
JPEG's lossy compression can introduce visible artifacts, these can be
minimized, and the savings in file size even at high quality levels is
much better than is generally possible with a lossless format like PNG.
And for black-and-white images, particularly of text or drawings, TIFF's
Group 4 fax compression or the JBIG format are often far better than
1-bit grayscale PNG.)
Like GIF and TIFF, PNG is a
raster
format, which is to say, it
represents an image as a two-dimensional array of colored dots (pixels).
PNG is explicitly not a
vector
format, i.e., one that can store
shapes (lines, boxes, ellipses, etc.) and be scaled arbitrarily without
any loss of quality (generally speaking). For that you probably want SVG
or PostScript. (There are some private extensions to PNG that
add
vector information in addition to PNG's regular pixels--Macromedia's
Fireworks does something along those lines--but no valid PNG may omit
the pixel data.)
Compression
PNG's compression is among the best that can be had without losing image
information and without paying patent fees, but not all implementations
take full advantage of the available power. Even those that do can be
thwarted by unwise choices on the part of the user.
PNG supports three main image types: truecolor, grayscale and palette-based
("8-bit"). JPEG only supports the first two; GIF only the third (although
it can fake grayscale by using a gray palette). The impact on compression
comes from the ability to mix up image types in PNG. Specifically, forcing an
application to save an 8-bit palette image as a 24-bit truecolor (or "RGB")
image is
not
going to result in a small file. This may be unavoidable
if the original has been modified to include more than 256 colors (for example,
if a continuous gradient background has been added), but many images intended
for the Web have 256 or fewer colors.
On the programmer's side, one common mistake is to include too many palette
entries in a PNG image. This error is most noticeable when converting tiny
GIF images (bullets, buttons, etc.) to PNG format; these images are typically
only 1000 bytes or so in size, and storing 256 three-byte palette entries
where only 50 are needed would result in over 600 bytes of wasted space.
Another common programmer mistake is to use only one type of compression
filter, or to vary them incorrectly. Compression filters are described
below and can make a dramatic difference in the compressibility of the
image. In general this is not a feature that users should be forced to
experiment with.
Finally, the low-level compression engine itself can be tweaked to compress
either better or faster. Often "best compression" is the preferred
setting, but an implementor may choose to use an intermediate level of
compression in order to boost the interactive performance for the user.
Usually the difference in file size is small, but there are cases where
such a choice can make a big difference.
See the
zlib home page
for further details on PNG's compression engine and the CRC-32 algorithm,
the
7-Zip home page
for an alternative
implementation of the deflate algorithm, and
Vince Sabio's
Compression Primer
for an overview of compression in general. For tools
to optimize the compression of PNG images, see the
converters
page (especially Glenn Randers-Pehrson's
pngcrush
and
Ulead's
SmartSaver
).
Compression Filters
Compression filters are a way of transforming the image data (losslessly)
so that it will compress better. Each horizontal line in the image can have
one of five filter types associated with it; choosing which of the five
to use for each line is almost more of a black art than a science.
Nevertheless, at least one reasonably good algorithm is not only known but
also described in the
PNG specification
and implemented
in freely available software. Other algorithms are likely to perform even
better, but so far this has not been an active area of research.
By way of example--admittedly an
extreme and unrealistic
case
--a
512 x 32,768 image
containing all
16,777,216 possible 24-bit colors compressed
over 300 times better
with filtering than without. The uncompressed image was 48 MB in size;
the compressed-but-unfiltered version was around 36 MB;
but the
filtered version
is only 115,989
bytes (0.1 MB). Yow. (A
4096 x
4096 version
, created by Paul Schmidt, is a mere 59,852 bytes--more than
600 times better
than the unfiltered version, at an overall compression
ratio of 841:1. Ted Samuels ran it through Ken Silverman's PNGOUT utility--see
the
converters
page for links to it and other
optimizers--and trimmed it to 57,549 bytes, for an overall 875:1 ratio. See
this page
for a downloadable version and further info.)
A more realistic example is the oceanography data at NASA's
Ocean ESIP
site. Digital maps
displaying various physical measurements can be generated dynamically in
either GIF or PNG format; the PNG versions are invariably one-fifth the size
of the GIFs, thanks to PNG's compression filters. For example, a map showing
the surface height of the northeastern Pacific Ocean on
1 August 1997 (during a major El Niño) is 70,090 bytes in
GIF format
but only 13,880 bytes in
PNG format
See the
Filter Algorithms
chapter of the
PNG specification
for details.
As a measure of just how unrealistic, note that these
seemingly hyper-compressed PNG images can themselves be compressed by
an additional factor of anywhere from 21 to 97 or so (depending on which
image) simply by applying
gzip
to them. Of course, a gzip'd PNG
is not terribly useful in most contexts, and
MNG
is the best of all--it drops
the size to
456 bytes
Alpha Channels
Also known as a
mask channel
, an alpha channel is simply a way to
associate variable transparency with an image. Whereas GIF supports simple
binary transparency--any given pixel can be either fully transparent or
fully opaque--PNG allows up to 254 levels of partial transparency in between
for "normal" images (or 65,534 levels for the special "deeply insane"
formats, but here we're concentrating on image depths that are useful on
the Web).
All three PNG image types--truecolor, grayscale and palette--can have alpha
information, but it's most commonly used with truecolor images. Instead
of storing three bytes for every pixel (red, green and blue), now four are
stored: red, green, blue and alpha, or RGBA. The variable transparency
allows you to create "special effects" that will look good on any background,
whether light, dark or patterned. For example, a photo-vignette effect can
be created for a portrait by making a central oval region fully opaque (i.e.,
for the face and shoulders), the outer regions fully transparent, and a
transition region that varies smoothly between the two extremes. When viewed
with a Web browser such as Arena, the portrait would fade smoothly to white
when viewed against a white background, or smoothly to black if against a
black background. Drop-shadows are another ideal application for alpha
transparency; in the images below, the same toucan image is displayed against
a colorful background and against another copy of itself:
Stefan Schneider's shadow-casting toucan displayed against different
backgrounds
This transparency feature is far more important for the small web graphics that
are typically used on web pages, such as colored (circular) bullets and fancy
text. Alpha blending allows one to use
anti-aliasing
--creating the
illusion of smooth curves on a grid of rectangular pixels by smoothly varying
the pixels' colors--to make rounded and curved images that look good against
any
background, not just against a white background (for example).
Thus the same image can be reused in many places without the "ghosting"
effect that occurs with GIFs.
Of course, effective replacements for GIF buttons and icons must be comparable
in size as well, and that mostly rules out truecolor RGBA images. But PNG
supports alpha information with palette images as well; it's just slightly
harder to implement in a smart way. A PNG alpha-palette image is just that:
an image whose palette also has alpha information associated with it, not a
palette image with a full alpha mask. In other words, each pixel corresponds
to an entry in the palette with red, green, blue
and
alpha components.
So if you want to have bright red pixels with four different levels of
transparency, you must use four separate palette entries to accommodate them.
(All four entries will have identical RGB components, but the alpha values
will differ.) If you want
all
of your colors to have four levels of
transparency, you've effectively reduced your total number of available colors
from 256 to 64. In general, though, only some of the colors need more than
one level of transparency, and recognizing which ones is where things get
tricky for the programmer. (If you don't want to trust your local programmer,
have a look at
pngquant
, which converts
32-bit RGBA PNGs into 8-bit RGBA-palette images. If you
are
programmer, also have a look at it; full source code is included.)
For a better explanation with some nice sample images, see the
Anti-aliasing
and Transparency
chapter of Chris Lilley's excellent WWW4 paper,
Not Just
Decoration: Quality Graphics for the Web
Gamma Correction
Gamma correction basically refers to the ability to correct for differences
in how computers (and especially computer monitors) interpret color values.
Web authors in particular are probably aware that Macintosh-generated images
tend to look too dark on PCs, and PC-generated images tend to look too light
on Macs. An image that looks good on an SGI workstation won't look right on
either a Macintosh or a PC, and even a PC-created image won't look right on
all
PCs.
Gamma information is a partial solution. It's a means of associating a
single number with a computer display system, in an attempt to characterize
the tricky physics lurking within a graphics card's digital-to-analog
converter (RAMDAC) and within a monitor's high-voltage electron gun.
Gamma is only an approximation; a better approximation is to use so-called
chromaticity
values (also supported by PNG) as well as gamma, but
even this is an approximation. The absolute best solution currently available
is to use a complete
color management system
(which, again, PNG supports
via the sRGB extension chunk). For most people, however, just supplying
the gamma value of the image and correcting for the corresponding gamma
value of the monitor system is sufficient.
For further information, see Chris Lilley's tutorials on
gamma
chromaticity
and
color management
or the
Gamma
Tutorial
appendix in the
PNG specification
For more detailed technical information, see Charles Poynton's
Gamma and Color FAQs
the
International Color Consortium
home page, the
sRGB
home page, John
Denker's extensive
color management
page, or Chris's
chapter on
gamma correction
(and subsequent chapters) in
Not Just
Decoration: Quality Graphics for the Web
Gamma logo courtesy of Claus Cyrny.
Interlacing
Interlacing--or, more generally, progressive display--has been around a long
time. GIF has supported it since 1989, TIFF since around the same time (though
not in any standardized way), and JPEG since the early 1990s (though
it wasn't widely implemented until 1996). PNG's method is conceptually similar
to GIF interlacing and visually similar to progressive JPEG (i.e.,
two-dimensional).
Here is a GIF animation by
Willem van Schaik
that shows some of the benefits of PNG's 2D interlacing
scheme over GIF's one-dimensional version:
PNG's 2D interlacing (left) compared with GIF's 1D interlacing (right)
The first thing to notice is that only the top one-eighth or so of the GIF
image is visible by the time the PNG image's first pass is complete. PNG's
first pass is only 1/64th of the image data; GIF's is 1/8th. By the time
GIF's first pass is done, four PNG passes have been displayed--and unlike the
GIF pixels, which are stretched by a factor of 8:1 at this point, the PNG
pixels are only stretched by 2:1. (Indeed, there is no stretching at all
in PNG's odd-numbered passes, and its even passes are all stretched by 2:1
vertically. This means that embedded text in an image is typically readable
about twice as fast in a PNG image.)
Also note that PNG's seventh pass and GIF's fourth pass are identical--both
consist of every other scanline. They each therefore represent fully one half
of the image data and one half of the decoding time. (The relative timing in
the animation above has been adjusted to emphasize the earlier passes over the
later ones.)
Check out the
PNG interlacing demo
for a "zoomed"
look at how PNG's interlaced pixels are displayed, or see the
Data Representation
chapter of the
PNG specification
for details of PNG's
interlacing scheme.
File Integrity Checks
PNG supports three main types of integrity-checking to help avoid problems
with file transfers and the like. The first and simplest is the eight-byte
magic signature
at the beginning of every PNG image. It will detect
the most common type of file corruption: that due to the transfer of a binary
file in text (or "ASCII") mode. On most systems, line-endings in text files
are flagged by either a carriage-return character (CR), a line-feed character
(LF), or both. Macintoshes use CRs; Unix systems use LFs; and all non-Unix
PC systems (DOS, Windows 3.x/95/NT, OS/2) use CR/LF pairs. PNG's magic
signature cleverly includes both a CR/LF pair and a single LF. Thus when
transferring in text mode to a DOS box, for example, the bare LF will acquire
a matching CR; when transferring to a Unix system, the CR/LF pair will turn
into a plain LF; and when transferring to a Macintosh, both the CR/LF and the
bare LF will probably turn into plain CRs. It's then a simple matter of
looking at the first eight or nine bytes in the file to see whether
text-corruption occurred (which is exactly the sort of thing the Unix
file
(1) command is designed to do). Keep in mind that messing up the
signature isn't that big a deal; the real problem is that CR and LF characters
in the
image data
--which don't have anything to do with line endings
or text but instead refer to pixel values or more abstract compressor
tokens--will also be converted, thus destroying the image.)
The second type of integrity-checking is known as a 32-bit
cyclic redundancy
check
or CRC-32. PNG images are divided up into logical data chunks, and
each chunk has an associated CRC stored with it. If even one bit in the chunk
changes, the CRC value one would calculate from the damaged data will no longer
match the stored CRC value from the original chunk data. This sort of thing
can easily be tested without decoding the image; in fact, it can be tested on
the fly, as the image is downloaded, if the downloading software is smart
enough.
The third type of integrity check applies only to the image-data chunk(s)
and is similar to the CRC values. Where an image chunk's CRC value applies
to the
filtered, compressed data
in the chunk, the
Adler-32
checksum
applies to the complete stream of
uncompressed
data
(regardless of how many image chunks that might span). It's really only
used by the lowest-level compression library as a check against bad encoding
and decoding software.
See the
File
Structure
chapter of the
PNG specification
for details.
Pronunciation
No detail was too small for consideration in the authors' quest for a
near-perfect image format; yea, verily, even the acronym and pronunciation
were major topics of discussion. The reason, of course, is the GIF format;
some pronounce it with a soft G like
giraffe
, some with a hard G
like
gift
, and no one really knows what they're talking about. (For
the record, the soft G is correct; it is how the author of the format
pronounces it.)
"PNG" is always spelled
"PNG" (or "Portable Network Graphics") and
always pronounced "ping" in English, not "pinj" or "pee en gee" or any
other multi-syllabic disaster. (For non-English speakers, the three-letter
pronunciation is fine, however.) See the
introduction
to the
PNG specification
(or the Scope section of the newer
ISO/IEC/W3C
version) for the definitive statement
on the matter.
Greg follows American English rules, but read
spelt
here if you "favour" the British "flavour."
;-)
Here are some related PNG pages at this site:
Frequently Asked Questions about PNG
Current Status of PNG
PNG Technical Documentation
PNG: The Definitive Guide
and Related
Books
PNG Home Page
Complete PNG Site Map
Last modified 14 March 2009.
Copyright © 1996-2009
Greg Roelofs
This page may be freely copied and modified under the terms of the
GNU Free Documentation License