Unicode fonts and tools for X11
Unicode fonts and tools for X11
The classic
X Window System
bitmap
fonts are now available in an ISO 10646-1/Unicode extension.
We have extended all the "-misc-fixed-*" fonts:
5x7 -Misc-Fixed-Medium-R-Normal--7-70-75-75-C-50-ISO10646-1
5x8 -Misc-Fixed-Medium-R-Normal--8-80-75-75-C-50-ISO10646-1
6x9 -Misc-Fixed-Medium-R-Normal--9-90-75-75-C-60-ISO10646-1
6x10 -Misc-Fixed-Medium-R-Normal--10-100-75-75-C-60-ISO10646-1
6x12 -Misc-Fixed-Medium-R-Semicondensed--12-110-75-75-C-60-ISO10646-1
6x13 -Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
6x13B -Misc-Fixed-Bold-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
7x13 -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-70-ISO10646-1
7x13B -Misc-Fixed-Bold-R-Normal--13-120-75-75-C-70-ISO10646-1
7x14 -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-70-ISO10646-1
7x14B -Misc-Fixed-Bold-R-Normal--14-130-75-75-C-70-ISO10646-1
8x13 -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-80-ISO10646-1
8x13B -Misc-Fixed-Bold-R-Normal--13-120-75-75-C-80-ISO10646-1
9x15 -Misc-Fixed-Medium-R-Normal--15-140-75-75-C-90-ISO10646-1
9x15B -Misc-Fixed-Bold-R-Normal--15-140-75-75-C-90-ISO10646-1
10x20 -Misc-Fixed-Medium-R-Normal--20-200-75-75-C-100-ISO10646-1
Coverage
These fonts contain now
all
characters found in the
following character sets:
ISO 8859 parts 1-5, 7-10, 13-15 (i.e., all parts except Arabic and Thai)
ISO 6937 and the
CEN MES-1
European Unicode Subset
IBM/Microsoft code pages CP 437, 850, 1251, 1252, and many others
Microsoft/Adobe
Windows Glyph List 4 (WGL4)
KOI8-R
DEC VT100 graphics symbols
The 6x13, 8x13, 9x15, 9x18, and 10x20 fonts cover a much larger
repertoire in addition, that covers the comprehensive
CEN
MES-3A European Unicode 3.2 Subset
, the International Phonetic
Alphabet, Armenian, Georgian, Thai, Yiddish, all Latin, Greek, and
Cyrillic characters, all mathematical symbols (including the entire
TeX repertoire), APL, Braille, Runes, and much more. 9x15 and 10x20
also cover Ethiopian.
Newly added fonts
The following new "-misc-fixed-*" fonts were added:
6x13O -Misc-Fixed-Medium-O-SemiCondensed--13-120-75-75-C-60-ISO10646-1
7x13O -Misc-Fixed-Medium-O-Normal--13-120-75-75-C-70-ISO10646-1
8x13O -Misc-Fixed-Medium-O-Normal--13-120-75-75-C-80-ISO10646-1
9x18 -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1
9x18B -Misc-Fixed-Bold-R-Normal--18-120-100-100-C-90-ISO10646-1
12x13ja -Misc-Fixed-Medium-R-Normal-ja-13-120-75-75-C-120-ISO10646-1
18x18ja -Misc-Fixed-Medium-R-Normal-ja-18-120-100-100-C-180-ISO10646-1
18x18ko -Misc-Fixed-Medium-R-Normal-ko-18-120-100-100-C-180-ISO10646-1
6x13O, 7x13O and 8x13O are oblique/italic versions of 6x13, 7x13
and 8x13. 9x18 is an improved version of 9x15 that has more space
above and below the base characters to increase readability and to
allow overstriking combining characters to work properly. 18x18ja and
18x18ko provide Japanese and Korean doublewidth ideograms for 9x18.
12x13ja provides Japanese doublewidth ideograms for 6x13.
Adobe BDF fonts
I have also created revised ISO10646-1 versions of all the Adobe
and B&H pixel fonts that come with X11R6.4. They contained about
30 additional Postscript characters (roughly the CP1252 repertoire)
that were present in the old ISO8859-1 BDF files, but were not encoded
and therefore not accessible for X clients. The revised ISO10646-1
versions contain not only these but also many more automatically
generated accented Latin characters (e.g., all characters from ISO
8859 parts 1-4, 9-10, 13-15), and they also fix a few long-standing
bugs with the old fonts (missing NBSP, exchanged
multiplication/division sign, etc.).
Status
The fonts are now completed and implement at present version 3.2 of
the Unicode standard (ISO 10646-1/Amd.1:2002). I will maintain them to
fix bugs and to satisfy any newly reported user requirements. Note
that the new fonts fix a
problem with the
Latin-1 quotation mark and accents
The fonts are freely available with installation instructions and
example UTF-8 text files
The "-misc-fixed-*" font package:
CJK ideographic wide character supplement
(unpack into the same subdirectory as the above):
The Adobe and B&H font package:
There is also a
change log
file for the "-misc-fixed-*" fonts.
Other character sets
The font packages include the
ucs2any.pl
Perl script,
which converts ISO 10646-1 fonts into any other encoding for which
there is a
Unicode
mapping table
available. This way, you can quickly generate ISO
8859-* versions from the above fonts automatically, for the benefit of
older software that cannot yet handle ISO 10646-1 fonts directly.
Distribution
I periodically contribute a recent snapshot of all of the above
fonts to
XFree86
and they have
been shipping as part of the XFree86 releases since XFree86 4.1. I
have also made them available to
X.Org
for inclusion into one of the next official X11 distributions as a
replacement for the current ISO 8859-1 BDF fonts (hopefully they will
be in X11R6.7).
Who created the original -misc-fixed-* ASCII fonts or the later ISO
8859-1 extension is not documented. They most likely came from either
MIT Project Athena or their industrial partner DEC in the 1980s. Most
of these fonts contained in the header of the BDF file the property
COPYRIGHT "Public domain font. Share and enjoy."
The contributors of the *-ISO10646-1 extensions agreed to keep it that
way, i.e. we haven't changed any of these copyright strings. It is
very unlikely that such low-res pixel fonts could even be copyrighted,
as they are clearly pictures and not computer programs, and as there
are only a limited number of ways to draw such glyphs in a
recognisable way. (Some countries explicitly do not offer
any
protection for typefaces
, e.g. see 37 C.F.R. § 202.1(e) in the
United States. In some others, any such protection would expire after
25 years or less.)
Related information and links
Read the
UTF-8 and Unicode FAQ for
Unix/Linux
for detailed general information on how to use Unicode
and its ASCII-compatible UTF-8 encoding under Unix, Linux, X11,
etc.
To use these ISO10646-1 fonts, you will need applications that
support ISO10646-1 fonts (hardly any software released before ~2001
does). These are not simply 8-bit replacement fonts but usually need
to be used together with UTF-8 support in an application. For
instance, if you want to use these fonts with xterm, you need to use
an
xterm
version that can handle ISO10646-1 fonts
(e.g., the one in XFree86
4.x).
The "-misc-fixed-*" fonts were created and extended using Mark
Leisher’s
xmbdfed
font editor, which later evolved into
the
gbdfed
font editor (using GTK+ instead of Motif). You can use the latter to
view and modify these fonts.
Unicode X11 font names end with
-ISO10646-1
. This is
the value in the official
X registry
git
for the
Logical Font Descriptor (XLFD)
fields
CHARSET_REGISTRY
and
CHARSET_ENCODING
for
all Unicode and ISO 10646-1 16-bit fonts. There is no registered XLFD
scheme yet for ISO 10646 characters outside the BMP, though some
proposals have been discussed.
Unicode and ISO 10646 merged CJK ideograph repertoires from
several groups of national source standards. In order to indicate that
an ISO10646-1 font with ideographic characters was designed following
the glyph style from one particular group of national source
standards, the
ADD_STYLE_NAME
XLFD field can be used to
indicate the corresponding language or region. Examples for such
ADD_STYLE_NAME
values are:
ADD_STYLE_NAME
IRG Source
Countries
Standards
zh
China, Hongkong, Singapore
GB2312, GB12345, GB7589, GB7590, GB8565, GB16500
zh_TW
Taiwan
CNS 11643
ja
Japan
JIS X 0208, JIS X 0212
ko
Korea
KS C 5601, KS C 5657, PKS C 5700
vi
Vietnam
TCVN 5773, TCVN 6056
ISO
639
language and
ISO
3166
country/region codes should be preferred in
ADD_STYLE_NAME
. This way, if an application knows a style
preference from, for example, the
RFC
1766
code "xx-yy" in a
language tag
, or
from the locale name "xx_YY.ZZZ", it can search for a suitable font by
looking for
ADD_STYLE_NAME
values in the following order
of preference: "xx_YY", "xx", "", "*".
The specification of the BDF X11 pixel font format is available
as Technical Note 5005 from Adobe
or in a simplified form as part
of the
X11 documentation
The fonts use the
Adobe
standard glyph names for Unicode fonts
There is also a
note on why the
apostrophe and grave accent look different
in the new fonts.
bug in
old GTK+ libraries
is triggered by the presence of ISO10646-1
Helvetica fonts.
Other information relevant to Unicode font projects
The
FreeType Project
has
implemented an open source version of the
Microsoft
Apple TrueType
font rendering engine.
OpenType
is a merger of the TrueType and Type1 font formats, with extensions
for better handling of all scripts supported by Unicode.
ttf2bdf and
xmbdfed
by Mark Leisher allow you to generate X11 BDF pixel fonts
from TrueType fonts and edit them.
PfaEdit
is a free
Type1, TrueType, OpenType and BDF font editor by George Williams.
Two free PostScript Type1 font editors that are currently under
development are
Spif
by Matty Farrow
and
gfonted
by Raph
Levien.
Commercial vector font editors:
Macromedia
Fontographer
FontLab
Taco Hoekwater
has
created a Unicode math font that can be used with Postscript Times
Roman.
xfsft:
TrueType Font Support For X
is a set of patches to make X11
support truetype fonts by
Juliusz Chroboczek
(included with XFree86 4.0).
Fonts
in the X Window System
by
Nelson Beebe
is a useful
quick introduction. The
XFree86
Font Deuglification Mini HOW-TO
by
Doug Holland
also contains
numerous good tips for setting up fonts under Linux/X11.
Luc Devroye’s
Font
software
page is a
very
comprehensive collection of
pointers to further font resources.
Ulf Jordan’s
Misc-Fixed ISO
10646-1 Outline Font Project
is to develop Type1 versions of the
BDF font family provided here.
Nick Reinking
prepared
TrueType versions of the
Latin-1 subsets of 6x13 and 6x13B
The
Electronic Font
Open Laboratory
in Japan is also working on a family of Unicode
bitmap fonts.
Why are there no Indic or Syriac glyphs in the ucs-fonts
package?
In European and East Asian scripts, each Unicode character can be
represented by a single graphical shape ("glyph"). The X11 font system
is entirely built around the idea that there is a one-to-one
relationship between characters and glyphs, which works fine for
Latin, Greek, Cyrillic, Hebrew, Han, Hiragana, Katakana, Hangul, etc.
However, things are far more complicated for handwritten cursive
scripts such as Arabic, Syriac and the various Indic scripts
(Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu,
Kannada, etc.). For these scripts, the sequence of values
("characters") encoded in a Unicode string (which usually corresponds
to the sequence of keystrokes during entry and the sequence of
phonemes when speaking) first has to be converted into a sequence of
graphical symbols ("glyphs") as they are found in a font, before a
string can be displayed. In a given Latin font style, always the same
graphical glyph of a font will be used for representing a character on
a screen. In an Arabic or Indic font, the shape of the glyph depends
not only on the character that it represents, but also on its
neighbour characters. Sometimes, different glyphs have to be used
depending on the character appearing at the beginning, middle, or end
of a word, and often certain entire sequences of characters have to be
represented by a special ligature glyph. A very simple form of that is
used in Latin fine typography in the form of the "fi" and "fl"
ligatures, but in Indic scripts, the situation is far more extreme,
and the number of glyphs is often several times the number of
characters. For details and examples, read
Chapter 9
and
Chapter
as well as the relevant
code charts
of
The Unicode
Standard
The Unicode standard does contain encoding ranges for a simple
scheme of Arabic glyphs, the "Arabic Presentation Forms". This was
possible, because for Arabic there is a reasonably good consensus
among font designers on how many glyphs are actually necessary for
proper rendering of Arabic text, even though some argue that for
really high-quality typesetting the Unicode collection of Arabic
presentation forms is not sufficient. For Indic scripts on the other
hand, there seems no consensus among font designers, which glyphs are
actually necessary as this can vary significantly across different
font styles. Therefore, an Indic font is always a proprietary
non-standardized collection of glyphs together with a mapping table
that defines, how sequences of standard Unicode characters have to be
transformed into sequences of non-standard Indic glyphs from this
particular font, before the text can be displayed.
The
OpenType
font format developed by Microsoft and Adobe is an outline font
format that does include such character/glyph mapping tables. The BDF
format used by X11 pixel fonts does not have any standardized way of
including a character/glyph mapping table, and neither do current BDF
editors such as xmbdfed or X servers. The
Pango
rendering library developed
for the Gnome project can make use of BDF glyph fonts, but it requires
the corresponding character/glyph mapping table in a separate
client-side file. The X11 standards currently provide no support for
transmitting such mapping tables over the X11 protocol. Roman
Czyborra’s GNU Unifont does contain a naive representation of the
Indic glyphs shown in the Unicode Standard code charts, but that is of
no use in practice for displaying Indic strings properly.
Summary: X11 was never designed for Arabic, Syriac, Indic, and
special libraries such as Pango have to be used for these scripts. If
you want to help getting Indic supported under X11, you have to extend
the X11 standards to fix this problem and provide a font mechanism
that understands that some scripts need to map characters into glyphs.
The solution is unfortunately not as easy as just drawing a few glyphs
with a font editor, otherwise we would already have added the Indic
scripts long ago to the ucs-fonts package.
Markus Kuhn
created 1998-09-22 –
last modified 2022-12-07 –
UK