Authoring web pages
Internationalization techniques:
Authoring web pages
This page lists links to resources on the W3C Internationalization Activity site and
elsewhere that help you author HTML and CSS for internationalization.
You are not expected to read this page from top to bottom. Instead, select topics of interest from the control just below.
You can see a list of
updates to this document
. You can also
raise an issue about this page
Find a topic
Characters
Getting started
Background reading
Character encodings for beginners
What is a character encoding, and why should I care?
Introducing character sets and encodings
A brief introduction to some of the concepts associated with character sets and encodings and the Web, with pointers to various techniques sections.
Character encodings: Essential concepts
Basic introductions to concepts related to character encoding. Includes:Unicode, character sets, coded character sets, character encodings, the document character set, character escapes, xhtml & mime types, and standards vs quirks modes.
Unicode
Character sets, coded character sets, and encodings
The Document Character Set
Character escapes
The HTTP header
Quick tips: Encoding
One of the 10 quick tips for internationalization.
Quick tips: Escapes
One of the 10 quick tips for internationalization.
^ Find another task
Choosing and applying a character encoding
See also
This section is specifically about how to choose a character encoding for your content and ensure that the content is in that encoding.
For information about how to declare the encoding so that the browser knows how to read your content see
Declaring the character encoding for HTML
and
Declaring the character encoding for your CSS stylesheet
See also the dedicated section about
Changing to UTF-8
Choose UTF-8 for all content.
more
If you really can't use a Unicode encoding, use only those legacy encodings listed in the Encoding specification.
more
Avoid the following encodings: UTF-16, UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC, CESU-8, UTF-7, BOCU-1, and SCSU.
more
How to's
Choosing & applying a character encoding
Which character encoding should I use for my content, and how do I apply it to my content?
Useful reference links
Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
HTML5, 13.2.3.3 Character encodings
Recommendations for support of particular encodings for browsers implementing HTML5.
Background reading
Who uses Unicode?
Are corporate Web sites using Unicode right now?
Document character set
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?
Show more links
Unicode over 60 percent of the web
A Google blog post by Mark Davis. A graph showing the growth of Unicode encodings to over 60% of Web pages (around 80% if you include ASCII-only pages).
IANA charset registry
The official registry of character encoding names.
^ Find another task
Changing to UTF-8
See also
This section is specifically about how migrate your content to the UTF-8 (Unicode) encoding. For more general advice see
Choosing and applying a character encoding
For information about how to declare the encoding so that the browser knows how to read your content see
Declaring the character encoding for HTML
and
Declaring the character encoding for your CSS stylesheet
Save the data as UTF-8, don't just change the encoding declaration.
more
Declare the encoding in your page.
more
Ensure that your server does the right thing.
more
How to's
Changing an HTML page encoding to UTF-8
How do I change the encoding of my HTML pages to UTF-8?
The byte-order mark (BOM) in HTML
What is the byte-order mark, and what do I need to know about it when creating HTML?
Migrating to Unicode
Detailed guidelines for the migration of software and data to Unicode.
Background reading
Document character set
What is the 'Document Character Set' for XML and HTML, and how does it relate to the encodings I use for my documents?
^ Find another task
Declaring the character encoding for HTML
See also
This section is specifically about how declare the character encoding of your HTML page.
For advice about which encoding to choose, see
Choosing and applying a character encoding
For further advice about setting the character encoding on the server, see
Setting the HTTP charset parameter
and
Setting character encoding information using .htaccess
Use the HTTP header if it is available.
more
Always use an in-document encoding declaration, even if you are also using the HTTP header.
more
Ensure that the encoding declaration fits within the first 1024 bytes of the page.
more
If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification.
more
Do not use the
charset
attribute on
or
link
elements.
more
How to's
Declaring character encodings in HTML
How should I declare the encoding of my HTML file? This page contains a quick reference section, followed by more detailed information.
Useful reference links
Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
HTML5
8.2.2 The input byte stream
Detailed technical information for browser implementers about how pages are parsed for recognition of the character encoding.
2.1.6 Character encodings
Information about preferred MIME names, ASCII compatible encodings, and Unicode characters.
4.2.5.5 Specifying the document's character encoding
How to use the meta element to declare the encoding.
Background reading
Serving HTML & XHTML
Introduces doctypes, mime-types, and the influence of standards- vs. quirks-mode on character encoding declarations.
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
Show more links
IANA charset registry
The official registry of character encoding names.
XHTML 1.0
C.1. Processing Instructions and the XML Declaration
Recommendation to avoid XML declaration for compatibility with HTML.
C.9. Character Encoding
Recommendations on use of meta encoding declarations for HTML compatibility.
Polyglot Markup: HTML-Compatible XHTML Documents
How to specify the encoding of documents that work as both HTML5 and XHTML5.
3. Specifying a Document's Character Encoding
How to specify the encoding of documents that work as both HTML5 and XHTML5.
2. Processing Instructions and the XML Declaration
The XML declaration is not allowed in documents that work as both HTML5 and XHTML5.
Character Model for the World Wide Web, 4.4.1 Mandating a unique character encoding, C034
You should use encoding declarations that are available.
HTML: The Markup Language, 4.2. Character encoding declaration
How to declare the character encoding in HTML5.
Extensible Markup Language (XML) 1.0, 4.3.3 Character Encoding in Entities
How to declare encodings in XML, with particular reference to the XML declaration.
HTML 4.01, 5.2.2 Specifying the character encoding
Character encoding information in the HTML 4.01 spec.
^ Find another task
Declaring the character encoding for a CSS style sheet
See also
This section is specifically about how declare the character encoding of your CSS stylesheet.
For advice about which encoding to choose, see
Choosing and applying a character encoding
For further advice about setting the character encoding on the server, see
Setting the HTTP charset parameter
and
Setting character encoding information using .htaccess
If you use UTF-8 as the character encoding for your style sheets and your HTML pages, and declare that encoding in your HTML, there is no need to declare the encoding for your style sheet.
more
If you use
@charset
, ensure that nothing (except a BOM) comes before it in the style sheet, and use the exact syntax.
more
If you cannot use UTF-8, use the preferred encoding name indicated in the Encoding specification.
more
Do not use the
charset
attribute on
or
link
elements.
more
How to's
CSS character encoding declarations
How do I declare the character encoding of a CSS style sheet?
Useful reference links
Encoding, 4.2 Names and labels
If you have a good reason for not using UTF-8, then use only the encodings and labels shown in the left column of this table.
Spec links
CSS Syntax Level 3, 3.2. The input byte stream
Character encoding information in the CSS Level 3 spec.
Show more links
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
IANA charset registry
The official registry of character encoding names.
CSS 2.1, 4.4 CSS style sheet representation
Character encoding information in the CSS 2.1 spec.
^ Find another task
Using escapes to represent characters
Avoid using escapes whenever possible. When you use UTF-8 it supports all the characters you need.
more
Use escapes for invisible or ambiguous characters.
more
Use CSS escapes for CSS embedded in HTML, rather than HTML escapes.
more
Always use Unicode codepoints for the numeric part of a character escape. Do not use codepoint values of non-Unicode encodings.
more
Use a single escape (representing the Unicode codepoint value) for supplementary characters. Do not escape surrogate character pairs.
more
Ensure that all
href
attribute values have escaped ampersands in query parameters, ie.
&
rather than just
more
Avoid named character entities in XHTML.
more
How to's
Using character escapes in markup and CSS
How can I use character escapes in markup and CSS, and when should I use or not use them?
Useful reference links
HTML5, 8.5 Named character references
Character reference names that are supported by HTML, and the code points to which they refer.
Spec links
HTML5, 8.5 Named character references
Character reference names that are supported by HTML, and the code points to which they refer.
Show more links
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
HTML 4.01
5.3 Character references
Numeric character references and character entity references described in the HTML4 spec.
24 Character entity references in HTML 4
List of character entity references supported by HTML 4 and XHTML 1.0.
XHTML 1.0
C.12. Using Ampersands in Attribute Values (and Elsewhere)
Advice on use of ampersands in href attributes in XHTML.
C.16. The Named Character Reference '
Advice to not use &apos in XHTML.
^ Find another task
Checking the encoding of a document
How to's
W3C Internationalization Checker
Shows the HTTP header information for a page, and all in-page encoding declarations. Also highlights conficts.
Checking HTTP Headers
How can I check the character encoding information sent in the HTTP header of a web document?
Checking the character encoding using the validator
How can I check that the character encoding of my document is correct using the W3C HTML Validator?
Useful reference links
Rex Swain's HTTP Viewer
Shows the HTTP header information for a page .
HTTP Header Checker by KeyCDN
Shows the HTTP header information for a page.
^ Find another task
Handling the byte-order mark (BOM)
If you use the byte-order mark with UTF-8-encoded pages, check that any scripts and back-end processes can handle the BOM.
more
If you ignored the advice above and encoded your page as UTF-16, always ensure that it starts with a BOM.
more
How to's
The byte-order mark (BOM) in HTML
What is the byte-order mark, and what do I need to know about it when creating HTML?
Useful reference links
W3C Internationalization Checker
Tells you whether your page starts with a BOM, and whether there is a BOM later in the content.
Spec links
HTML5, 8.2.2 The input byte stream
How HTML5 detects the character encoding of a page, and mentions how browsers should handle BOM detection.
CSS Syntax Level 3, 3.2. The input byte stream
Character encoding information in CSS. Mentions how browsers should handle the BOM.
Show more links
Handling character encodings in HTML and CSS
Tutorial style article that gathers together and organizes pointers to articles that, taken together, help you understand how to handle the essential aspects of authoring HTML and CSS related to characters and character encodings.
CSS 2.1, 4.4 CSS style sheet representation
Character encoding information in the CSS 2.1 spec. Mentions how browsers should handle the BOM.
^ Find another task
Handling character normalization
Ensure that all HTML class names and CSS selectors are saved using the same Unicode normalization form (NFC is recommended).
more
How to's
Normalization in HTML and CSS
What are normalization forms, and why do I need to know about them when creating HTML and CSS content??
Useful reference links
W3C Internationalization Checker
Tells you whether your HTML page contains non-NFC class names and ids.
^ Find another task
Handling encoding issues in forms
Use UTF-8 for the character encoding of your page.
more
Consider checking on the server that form data is arriving in UTF-8.
more
How to's
Multilingual form encoding
What is the best way to deal with encoding issues in forms that may use multiple languages and scripts?
^ Find another task
Using Unicode control codes
See also
If you represent control codes using character escapes, see also
Using escapes to represent characters
for more information.
Don't use Unicode characters if there is markup to do the same job.
more
Use character escapes to represent control codes, so that they are visible.
more
How to's
Characters or markup?
There are a range of control-like Unicode characters, some of which fulfill the same role as markup. Which should I use, and which should I avoid?
Unicode in XML & Other Markup Languages
Guidelines on the use of the Unicode Standard in conjunction with markup languages such as XML.
Unicode controls vs. markup for bidi support
To correctly format bidi text in HTML or XML content, should I use Unicode control codes or markup?
Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do?
HTML, XHTML, XML and Control Codes
How do I handle control codes (ie. the 'C0' U+0000-U+001F and 'C1' U+007F-U+009F ranges) in XML, XHTML and HTML?
^ Find another task
Working around unavailable characters/glyphs
How to's
Missing characters and glyphs
What to do if a Unicode character or font glyph is missing.
^ Find another task
Using non-ASCII web addresses
Useful reference links
Internationalized country code top-level domain
Wikipedia article. Contains news about recent developments.
Internationalized domain name
Wikipedia page.
mod_fileiri: new Apache module under development
Martin Dürst's fileiri Apache module.
Spec links
RFC 3987 Internationalized Resource Identifiers (IRIs)
IETF Proposed Standard for handling of IRIs.
Unicode Technical Report #36 Unicode Security Considerations
Describes security issues related to phishing.
Background reading
An Introduction to Multilingual Web Addresses
How IDN and IRIs work, aimed at content authors and general users who want to understand the basics without too many gory technical details.
Other links
Examples of registered IDNs
Lists of IDNs that work with links to the sites.
^ Find another task
Language
Getting started
Background reading
Language on the Web
W3C Getting Started article.
Working with language in HTML
W3C tutorial.
Language tags in HTML and XML
How to choose the right attribute values. W3C article.
Quick tip: Language
One of the 10 quick tips for internationalization.
Why use the language attribute?
Why should I use the language attribute in web pages?
^ Find another task
Declaring the overall language of a page
See also
For detailed advice about how to select the right language tags, see
Choosing language values
See also
Declaring metadata about the language of the intended audience
Always declare the default language for text in the page using attributes on the
html
tag.
more
Do NOT use the
meta
element with the
content
attribute set to
Content-Language
more
Use language attributes rather than HTTP to declare the default language for 'text processing' (ie. when language needs to be known for things such as font choice, styling, spell-checking, hyphentation, quote mark styling, etc.).
more
Do not declare the default language of a document in the
body
element, use the
html
element.
more
Where a document contains content aimed at speakers of more than one language, decide whether you want to declare one language in the
html
tag, or leave the languages undefined until later.
Where a document contains content aimed at speakers of more than one language, try to divide the document linguistically at the highest possible level, and declare the appropriate language for each of those divisions.
For HTML use the
lang
attribute only, for XHTML 1.0 served as
text/html
use the
lang
and
xml:lang
attributes, and for XHTML served as XML use the
xml:lang
attribute only.
more
How to's
Declaring Language in HTML
How should I set the language of the content in my HTML page?
Background reading
Types of language declaration
Describes two different types of language information, 'metadata' and 'text-processing', and how they differ.
Why use the language attribute?
Why should I use the language attribute in web pages? A number of useful reasons.
HTTP headers, meta elements and language information
For HTML, should we put language declarations in HTTP headers and meta elements, and how are they different from those in language attributes?
Spec links
HTML
3.2.6.2 The
lang
and
xml:lang
attributes
The language attributes in HTML.
4.2.5.3 Pragma directives
How HTML deals with a meta element with http-equiv set to Content-Language.
Tests
HTML5, the lang attribute
Show more links
HTML4.01, 8.1 Specifying the language of
content: the lang attribute
lang attribute definition in HTML 4.01
XML 1.0, 2.12 Language Identification
xml:lang attribute definition in XML.
WCAG, Guideline 4. Clarify natural language usage
Recommendation to express natural language in a document in Web Content Accessibility Guidelines.
WCAG, 2.2 Identifying the primary language
Recommendation to use lang attribute on html tag in Web Content Accessibility Techniques for HTML.
XHTML 1.1, 3. The XHTML 1.1 Document Type
The 2nd edition introduced the lang attribute to go with the xml:lang attribute.
XHTML 1.0, C.7 The lang and xml:lang Attributes
xml:lang and lang attribute definitions in XHTML 1.0.
HTTP 1.1, 14.12 Content-Language
Content-Language definition in HTTP 1.1.
Polyglot markup, 7.2 Language Attributes
Using lang and xml:lang in HTML5 polyglot documents.
Polyglot markup, 6.5.1.1 Content-Language
Content-Language and HTML5 polyglot documents.
^ Find another task
Identifying in-document language changes
See also
See also
Declaring the overall language of a page
For detailed advice about how to select the right language tags, see
Choosing language values
When the page contains content in another language, add a language attribute to an element surrounding that content.
more
For HTML use the
lang
attribute only, for XHTML 1.0 served as
text/html
use the
lang
and
xml:lang
attributes, and for XHTML served as XML use the
xml:lang
attribute only.
more
If the text in attribute values and element content is in different languages, consider using a nested approach.
more
How to's
Declaring Language in HTML
How should I set the language of the content in my HTML page?
Background reading
Why use the language attribute?
Why should I use the language attribute in web pages? A number of useful reasons.
Spec links
HTML5, 3.2.3.3 The lang and xml:lang attributes
The language attributes in HTML5.
Show more links
HTML4.01, 8.1 Specifying the language of
content: the lang attribute
lang attribute definition in HTML 4.01
XHTML 1.0, C.7. The lang and xml:lang Attributes
Use both lang and xml:lang.
XHTML 1.1, 3. The XHTML 1.1 Document Type
The 2nd edition introduced the lang attribute to go with the xml:lang attribute.
XML 1.0, 2.12 Language Identification
xml:lang attribute definition in XML.
WCAG, Guideline 4. Clarify natural language usage
Recommendation to express natural language in a document in Web Content Accessibility Guidelines.
WCAG Techniques, 2.1 Identifying changes in language
Recommendation to use lang attribute when language changes in a document, in Web Content Accessibility Techniques for HTML.
Polyglot markup, 7.2 Language Attributes
Using lang and xml:lang in HTML5 polyglot documents.
^ Find another task
Choosing language tags
Use subtags as defined by BCP 47 for language attribute values.
more
Use the shortest possible language tag values.
more
Where possible, use the codes zh-Hans and zh-Hant to refer to Simplified and Traditional Chinese, respectively.
more
Use the subtag zxx when the text is
known to be
not in any language.
more
When the language is undetermined
and you have to label it
, use lang="".
more
If you are serving XML, and the format you are using supports it, use xml:lang="", otherwise use xml:lang="und" when the language is undetermined
and you have to label it
more
How to's
Choosing a Language Tag
Which language tag is right for me? How do I choose language and other subtags? Covers all the subtag types in the latest version of BCP47.
Language tags in HTML and XML
A simple overview of the syntax for language tags in BCP 47.
Tagging text with no language
How do I use language markup in HTML or XML content when I don't know the language, or the content is non-linguistic?
Two-letter or three-letter language
codes
Should I use two-letter or three-letter ISO language codes in language tags? W3C article.
Picking the Right Language Identifier
Describes how to select Unicode language identifiers.
Useful reference links
IANA Language Subtag Registry
This is the official location where you will find all subtags available for use in language tags.
Language Subtag Lookup tool
User friendly interface to IANA's language tag registry. Provides for checking of subtags as well as lookup. Up-to-date with latest version of BCP 47.
Internet-Draft: BCP 47
Points to a document containing both RFC 5646 (Tags for the Identification of Languages) and RFC 4647 (Matching
Language Tags)
RFC 5646 Tags for the Identification of Languages
The specification that describes language tag syntax.
RFC 4647 Matching of Language Tags
The specification that describes alternative ways of matching language tags.
Language Tags
Provides various useful links about language tags and a good place to find up-to-date information.
Spec links
HTML, 3.2.6.2 The
lang
and
xml:lang
attributes
The language attributes in HTML.
Show more links
Specifying the language of content: the lang attribute
lang in the HTML 4.01 spec (section 8.1)
Language Identification
xml:lang in the XML spec (section 2.12)
ISO 3166: Codes for Country Names
ISO country codes
ISO 639: Codes for the Representation of Names of Languages
ISO language codes
RFC 4646 Tags for the Identification of Languages
[Of historic interest only] An earlier version of the specification that describes language tag syntax.
RFC 3066 Tags for the Identification of Languages
[Of historic interest only] The previous IETF document that used to define how to use language tags to identify languages, now obsolete.
Understanding the New Language Tags
[Of historic interest only] Overview of planned improvements for RFC3066bis by one of its authors.
^ Find another task
Declaring metadata about the language(s) of the intended audience
See also
This section is specifically about setting metadata for the document as an object. For information about declaring the language of the document for text-processing purposes, see
Declaring the overall language of a page
For detailed advice about how to select the right language tags, see
Choosing language values
Consider using a
Content-Language
HTTP header to declare metadata about the language(s) of the intended audience of a document.
more
Where a document contains content aimed at speakers of more than one language, use the HTTP
Content-Language
header with a comma-separated list of language tags.
more
How to's
HTTP headers, meta elements and language information
For HTML, should we put language declarations in HTTP headers and meta elements, and how are they different from those in language attributes?
Declaring language in HTML
How should I set the language of the content in my HTML page? Includes:
Specifying metadata about the audience language
Talks about using HTTP headers to provide metadata.
Background reading
Types of language declaration
Describes two different types of language information, 'metadata' and 'text-processing', and how they differ.
Spec links
HTML5, 4.2.5.3 Pragma directives
How HTML5 deals with a
meta
element with
http-equiv
set to
Content-Language
HTTP 1.1, 14.12 Content-Language
The
Content-Language
HTTP header described in the HTTP1.1 specification.
Show more links
HTML 4.01, 8.1 Specifying the language of
content: the lang attribute
Content-Language
in the HTML specification: only says that the
html
language attribute has a higher precedence.
^ Find another task
Indicating the language of a link destination
See also
For advice about how to select the right language tags, see
Choosing language values
See also
Linking to localized content
in the Navigation section.
When pointing to a resource in another language, consider the pros and cons before indicating the language of the target document.
more
If you want to indicate that the target document of an a element is in another language, consider the pros and cons before using
hreflang
with CSS.
more
Do not use flag icons to indicate languages.
more
How to's
Indicating the language of a link destination
What should I bear in mind if I want to indicate to the reader that a link points to a page in a different language?
About languages and flags
Why country flags as symbols for languages are problematic, and what you should do instead.
Spec links
HTML5, 4.12.2 Links created by a and area elements
hreflang
in the HTML5 spec
CSS 2.1, 12.1 The :before and :after pseudo-elements
:before
and
:after
in the CSS 2.1 spec
Show more links
HTML 4.01, 12.2 The A element
hreflang
in the HTML 4.01 spec.
^ Find another task
Setting & changing browser language preferences
How to's
Setting language preferences in a browser
How do I check or change the language settings of my browser?
Internationalization checker
Tells you what your current
Accept-Language
headers are set to. (See the bottom of the information table.)
Useful reference links
Content Negotiation
Apache documentation on content negotiation.
Debian web site in different languages
How to set language preferences in a variety of legacy versions of browsers.
^ Find another task
Using Accept-Language for locale setting
How to's
Accept-Language used for locale setting
Is it a good idea to use the HTTP Accept-Language header to determine the locale of the user?
Date formats, Option Three: Use the Accept-Language HTTP header
How do I prepare my web pages to display varying international date formats?
^ Find another task
Markup & text
Getting started
How to's
Quick tips: Presentation vs. content
One of the 10 quick tips for internationalization.
Quick tips: Text authoring
One of the 10 quick tips for internationalization.
^ Find another task
Using b and i tags
Use the class attribute on a b or i element to identify why the element is being used.
more
Consider whether other elements might be more applicable than the b or i element because they carry the right semantics.
more
How to's
Using and elements
Should I use and elements, and if so, what do I need ?
Spec links
HTML5, 4.5.17 The i element
The
element in the HTML5 spec
HTML5, 4.5.18 The b element
The
element in the HTML5 spec
^ Find another task
Using ruby markup
See also
This section is specifically about how to use
markup
for ruby annotations. For information about styling ruby see
Styling ruby text
How to's
Ruby markup
Discusses how to use ruby markup in HTML5, and has pointers to what currently works in browsers.
Useful reference links
Language enablement index: Inline notes & annotations
Links to requirements for inline notes & annotations in the language enablement index.
Background reading
What is ruby?
What are 'ruby' annotations?
Bopomofo on the Web
A summary of how bopomofo is used and the implications for support on the Web.
Use Cases & Exploratory Approaches for Ruby Markup
Discussion about what is needed in the HTML5 specification, and possibly other markup vocabularies, to adequately support ruby markup. It looks at a number of use cases and how well they are supported by the various markup models.
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7,
especially chapters 6 and 7)
Spec links
HTML5
4.6.21 The ruby element
The
ruby
element in the HTML5 spec
4.6.22 The rt element
The
rt
element in the HTML5 spec
4.6.23 The rp element
rp
in the HTML5 spec
HTML Ruby Markup Extensions
Proposed extensions to the HTML5 markup model.
Tests
HTML5, the ruby element and its children
Show more links
Ruby extension markup
Looks at various possible models for marking up ruby, with a view to informing discussion about the models found in the HTML5 draft of 25 October 2012 and the proposed Ruby Extension Spec as of 25 February 2013.
Ruby Annotation Recommendation
W3C Recommendation that defines markup for ruby, in the form of an XHTML module. The HTML5 markup model should eventually replace this specification.
CSS3 Ruby Module
W3C Working Draft that defines how ruby elements can be styled in various different ways. This draft is likely to change significantly as it is reworked to support the HTML5 markup model. (For more about styling ruby see
Styling ruby text
).
XHTML 1.1, 3. The XHTML 1.1 Document
Type
Ruby Annotation inthe XHTML 1.1 spec (bottom of the page)
Implementing the Ruby
Module
Sample module implementations of the Ruby Annotation Specification in several schemas (W3C Personal Note)
^ Find another task
Working with form controls
See also
In the Characters section see
Handling encoding issues in forms
In the section Text Direction see
Managing text direction in form controls
In the section Styling & Layout see
Working with names
and
Working with date formats
How to's
Sorting select options
As part of a form, I have a list of terms in a drop-down box. Why are they not correctly sorted when I translate the items in the list?
^ Find another task
Working with strings in JavaScript & databases
Use a topic-comment approach whenever possible.
more
Avoid sentence-like arrangements when they contain substrings that are predefined translatable text or numeric text.
more
Use sentence-like arrangements with care if you have non-numeric and non-translatable text substrings (ie. text created at runtime).
more
Where the parts of a composite message appear in separate locations, provide the translator with contextual information to show how the various parts of a composite message relate to each other.
more
Provide information to the translator, where needed, to clarify what a substring represents.
more
When requested by the localization group, be prepared to provide information about the size of each substring.
more
Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase.
more
Reused strings must not refer to more than one text, graphic or conceptual context.
more
If in doubt as to whether a string is a good candidate for re-use, don't.
more
If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box.
more
How to's
Working with Composite Messages
Why you need to be very careful about splitting up and reusing text on-screen. The linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve.
Re-using Strings in Scripted Content
Things to be aware of if you plan to use the same text string in different places on your site or user interface.
Useful reference links
(or, Variables in Interface Language)
Article by Chris Noessel illustrating a number of examples where composite messages can cause problems.
^ Find another task
Indicating what should and should not be translated
Use the
translate
attribute on an element to prevent its content being translated by online translation services or by computer-assisted translation tools.
more
How to's
Using HTML's translate attribute
What is the translate attribute for, and how should I use it?
Spec links
HTML5, 3.2.3.4 The translate attribute
The
translate
attribute in the HTML5 spec
Tests
HTML5, the translate element and online services
HTML5, the translate attribute
Show more links
HTML5 adds new translate attribute
Blog post describing the translate attribute, how it works, and why it is needed.
^ Find another task
Styling & layout
Getting started
How to's
Quick tips: Presentation vs. content
^ Find another task
Preparing for text expansion during translation
Ensure that your graphic backgrounds can automatically expand with the text they are related to, avoid highly constrained spaces, and anticipate that the box containing your text may grow during translation.
more
How to's
Background images that support localization
How can I ensure that when text expands in translation the background images will still work?
Background reading
Text size in translation
Overview of text expansion issues.
Display capabilities
Do I need to worry because display capabilities (screen sizes, number of colors, etc.) of computers vary in other countries?
Sliding Doors of CSS
Douglas Bowman's article in A List Apart about how to layer background images, allowing them to slide over each other to create certain effects. (A note from the editors: While brilliant for its time, this article no longer reflects modern best practices.)
^ Find another task
Styling by language
See also
Related sections include
Using attributes to declare language
, and
Choosing language values
Use :lang to set language-specific styling
more
How to's
Styling using the lang attribute
Compares :lang, lang |= and lang= selectors, for both HTML and XML. Includes:
The :lang() pseudo-class selector
How to use it.
Using CSS selectors in XML with xml:lang
Dealing with namespaces in documents served as XML.
Language tags in HTML and XML
How language tags work and where to find which one to use.
Spec links
Selectors Level 4, 7.2 The Language Pseudo-class: :lang()
Selectors Level 4, 6 Attribute selectors
CSS Namespaces Module
Tests
CSS3 Selectors, language selectors
^ Find another task
Using logical property styles
Use CSS logical properties wherever possible, so as to facilitate localization into right-to-left and vertically-set scripts.
How to's
MDN: CSS Logical Properties and Values
An introduction.
Basic concepts of Logical Properties and Values
Introduction to the specification, and explanation of flow relative properties and values.
Logical properties for sizing
Explains the flow-relative mappings between physical dimension properties and logical properties used for sizing elements on our pages.
Logical properties for margins, borders and padding
A look at flow-relative mappings for the various margin, border, and padding properties and their shorthands.
Logical properties for floating and positioning
How to use logical mappings for the physical values of float and clear, and also for the positioning properties used with positioned layout.
Spec links
CSS Logical Properties and Values Level 1
Tests
CSS Logical Properties and Values
^ Find another task
Styling counters for lists, etc.
Use the CSS
@counter-style
rule to define or modify counters used for list markers, figure numbering, chapter headings, etc.
Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the default for some scripts/languages.
How to's
MDN: @counter-style
How to define your own counter styles when the pre-defined styles aren't fitting your needs.
Ready-made Counter Styles
Cut-and-paste code snippets for a large number of international counter styles that can be used for ordered lists and other such counters.
How to make list markers stand upright in vertical text
Use CSS to make list counters stand upright above vertical text.
Of Urdu digits and CSS counter styles
How to create Urdu counter styles.
Useful reference links
Counter styles converter
Allows you to create and test your own styles, or tweak and test the many code snippets listed in the Ready-made Counter Styles doc.
Language enablement index: Lists, counters, etc
Links to information about lists and counter-styles in the language enablement index.
Spec links
CSS Counter Styles Level 3
CSS Lists and Counters Module Level 3
Tests
Custom counter styles
CSS3 Counter Styles, predefined styles
CSS3 Counter Styles
Show more links
CSS3 and International Text: Lists
Preview of upcoming proposals for CSS3 written in 2003.
^ Find another task
Managing line breaks
See also
Hyphenation affects line-breaking, but has it's
own section here
. Line-breaking behaviour is also closely associated with justification. For the latter, see
Justifying & aligning text
Since default line-breaking rules vary by language, always correctly label your content for language
more
How to's
MDN: word-break
Specifies whether or not the browser should insert line breaks wherever the text would otherwise overflow its content box due to a lack of spaces. Particularly useful for Chinese, and Japanese. Values include
break-all
and
keep-all
MDN: line-break
For Chinese, Japanese, or Korean (CJK), specifies how (or if) to break lines when working with punctuation and symbols. Values include
strict
normal
loose
, and
anywhere
Useful reference links
Language enablement index: Line-breaking
Links to information about line-breaking in the language enablement index.
Background reading
Approaches to line breaking
High level summary of various typographic strategies for wrapping text at the end of a line, for a variety of scripts.
🎥 The Hidden Rules of Wrapping Text on the Web (video)
Summary of the diverse typographic strategies of text wrapping.
Spec links
CSS Text Module Level 3, 5. Line Breaking and Word Boundaries
CSS Text Module Level 3, 6. Breaking Within Words
Tests
CSS3 Text, Line breaking, BA, OP, CL and NS
CSS3 Text, Non-tailorable line breaking
CSS3 Text, word-break
Japanese & Chinese line breaks
Show more links
CSS3 and international text: Line breaking
Preview of upcoming proposals for CSS3 written in 2003.
^ Find another task
Hyphenation
See also
This section is specifically about hyphenation. For more general information about line breaking see
Managing line breaks
Since CSS hyphenation only works if content is labelled for language, always do that
. Since hyphenation rules are language-specific, ensure that the language is labelled correctly.
How to's
All you need to know about hyphenation in CSS
The various CSS properties and settings to manage hyphenation.
MDN: hyphens
Specifies how and whether words should be hyphenated when text wraps across multiple lines. Also includes a table of supported languages in browsers.
Useful reference links
Language enablement index: Hyphenation
Links to information about hyphenation in the language enablement index.
Spec links
CSS Text Module Level 3, 5.4 Hyphenation: the hyphens property
CSS Text Module Level 4, 8. Breaking Within Words
Tests
CSS Text, hyphens
^ Find another task
Justifying and aligning text
See also
Justification behaviour is closely associated with line-breaking and hyphenation. For more information on those topics, see
Managing line breaks
Wherever possible use
start
and
end
values for the CSS
text-align
property, rather than
left
and
right
. Only use
left
and
right
on the rare occasions when the alignment has to remain as is, regardless of language.
more
Only use
text-align
when you really need to override the alignment produced by the current base direction. Don't litter your markup or stylesheet with unnecessary alignment calls.
Avoid using HTML attributes with values of
left
and
right
. Instead add selectors to your CSS stylesheet. This allows you to use logical properties, but also makes it much easier to change things during localisation.
Use CSS property names that include the words 'start' and 'end', rather than 'left', 'right', 'top', and 'bottom'. Eg.
margin-inline-start
and
margin-block-start
more
Don't assume that all writing systems prefer a ragged edge at the line end. Fully-justified text is the preferred default for some scripts/languages.
Since justification rules vary by language, always correctly label your content for language
more
How to's
MDN: text-align
Specifies the horizontal alignment of an inline or table-cell box, including the value
justify
, which is used to turn on justification.
MDN: text-justify
Defines what type of justification should be applied to text when it is justified (ie. when
text-align:justify
is set). Values include
inter-word
and
inter-character
Useful reference links
Language enablement index: Justification & line-end alignment
Links to information about justification in the language enablement index.
Background reading
Approaches to full justification
High level summary of various typographic strategies for fully justifying text on a line and in a paragraph for a variety of scripts, and some advice for authors and implementers.
Spec links
CSS Text Module Level 3, 6. Alignment and Justification
CSS Text Module Level 3, 8. Spacing
Tests
CSS3 Text, text-align:justify
CSS3 Text, text-justify
Show more links
CSS3 and international text: Line breaking
Preview of upcoming proposals for CSS3 written in 2003.
CSS3 and international text: Text spacing
Preview of upcoming proposals for CSS3 written in 2003.
^ Find another task
Creating vertical text
How to's
Styling vertical Chinese, Japanese, Korean and Mongolian text
How to use CSS to create vertical text, and what is currently supported. Includes:
Basic setup
Use writing-mode to achieve the basic direction.
Changing the glyph orientation for embedded text
How to make non-native text stand upright, rather than flow down the page.
Horizontal-in-vertical text
Make numbers and short texts run horizontally within the vertical line.
Forms, lists and tables
Working with forms, lists and tables.
Spec links
CSS Writing Modes Module Level 3
Other links
Text direction
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
^ Find another task
Styling text decorations
How to's
Styling underlines
Ways in which CSS can be used to manage underline positioning for non-Latin scripts.
Spec links
CSS Text Decoration Module, Line Decoration: Underline, Overline, and Strike-Through
^ Find another task
Styling ruby text
See also
This section is specifically about
styling
ruby text. For more information about markup for ruby see
Using ruby markup
How to's
Ruby Styling
Discusses how to use CSS styling to affect the rendering of ruby content.
MDN: ruby-align
Defines the distribution of the different ruby elements over the base.
Useful reference links
Language enablement index: Inline notes & annotations
Links to requirements for inline notes & annotations in the language enablement index.
Background reading
Ruby
What is 'ruby'?
CJKV Information Processing
Useful information about ruby in general (Ken Lunde's book, CJKV Information Processing, ISBN 1-56592-224-7,
especially chapters 6 and 7)
Spec links
CSS3 Ruby Module
Tests
CSS3 Ruby
Includes tests for
ruby-position
ruby-align
ruby-merge
, and ruby autohide
Show more links
Ruby style
Introduction to styling ruby with CSS3 Ruby Module. In W3C article, Ruby Markup and Styling.
^ Find another task
Applying various script-specific typographic conventions
Other links
Document grids
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Kumimoji and warichu
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
Emphasis
Preview of upcoming proposals for CSS3. In W3C article, CSS3 and International Text.
^ Find another task
Using fonts & webfonts
How to's
How to Use Cross Browser Web Fonts
Useful tutorial on how to use webfonts, and some things to look out for.
Fonts supplied with Windows and macOS, by script
Lists of fonts provided by the Windows 10 and macOS operating systems, as well as Google's Noto fonts and SIL fonts, grouped by script. Useful to set font-family styles for CSS.
^ Find another task
Working with date formats
How to's
Date formats
How do I prepare my web pages to display varying international date formats?
^ Find another task
Working with personal names
Ask yourself whether you really need to have separate fields for given name and family name.
more
Make input fields long enough to enter long names, and ensure that if the name is displayed on a web page later there is enough space for it.
more
Avoid limiting the field size for names in your database.
more
Try to avoid using the labels 'first name' and 'last name' in non-localized forms.
more
Consider whether it would make sense to have one or more extra fields, in addition to the full name field, where you ask the user to enter the part(s) of their name that you need to use for a specific purpose.
more
Ask separately, when setting up a profile for example, how that person would like you to address them.
more
If you have separate fields for parts of a person's name, ensure that you label clearly which parts you want where
more
Be careful about assumptions built into algorithms that pull out the parts of a name automatically.
more
Be as clear as possible about telling people how to specify their name.
more
Don't assume that a single letter name is an initial.
more
Don't require that people supply a family name.
more
Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names.
more
Don't require names to be entered all in upper case.
more
Allow the user to enter a name with spaces.
more
Don't assume that members of the same family will share the same family name.
more
It may be better for a form to ask for 'Previous name' rather than 'Maiden name' or 'née'.
more
If you hope to get Latin- or ASCII-only, you need to tell the user.
more
You may want to store the name in both Latin and native scripts, in which case you probably need to ask the user to submit their name in both native script and Latin-only form, using separate fields.
more
If you do accept non-ASCII names, you should use a Unicode character encoding (eg. UTF-8) in your pages, your back end databases and in all the software code in between.
more
How to's
Personal names around the world
How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?
^ Find another task
Bidirectional text
Getting started
How to's
Unicode Bidirectional Algorithm basics
A gentle introduction to how the bidi algorithm works.
Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts
A tutorial, that gathers together and organizes pointers to articles that, taken together, help you understand the essential aspects of how to work with languages in right-to-left scripts and bidirectional text when authoring HTML and CSS.
Quick tips: Right-to-left text
One of the top 10 quick tips for internationalization is about right-to-left text.
Languages using right-to-left scripts
Lists 12 scripts and over 200 languages using RTL orthographies in the modern day, plus rough Ethnologue data on countries & speaker numbers.
Working with source code markup and code examples for RTL scripts
Discusses challenges of editing markup for pages in Arabic, Hebrew, and other RTL languages, and creating code examples with bidirectional text.
^ Find another task
Setting up a right-to-left page
See also
This section is about setting up the default direction for a whole page. For information about working with text direction changes inside the document see
Changing the direction of a block element
and
Mixing text direction inline
Only use bidi markup to set the base direction for the document as a whole, or where you need to
change
the base direction
more
Add
dir="rtl"
to the
html
tag any time the overall document direction is right-to-left.
more
Don't add
dir="rtl"
to the
body
tag.
more
If you need to avoid the scroll bar moving on some browsers, put
dir
on the
head
element and a
div
just inside the
body
element
more
Use logical order, not visual ordering for Hebrew, and choose an appropriate encoding
more
If you have to use an ISO encoding for a Hebrew page, declare the encoding as ISO-8859-8-i rather than ISO-8859-8
more
Do not use CSS styling to control directionality in HTML. Use markup
more
How to's
Text direction and structural markup in HTML
How to use the
dir
attribute and handle alignment. Includes:
Setting direction at the document level
Using
dir
on the
html
tag to set the default direction of the document.
Working with browsers that change the browser chrome
Workarounds if you don't want the browser to change the UI when
dir
is set on the
html
tag.
Visual vs. logical ordering of text
What is the difference between visual and logical ordering of text, and which should I use?
Spec links
HTML5, 3.2.3.6 The
dir
attribute
Tests
HTML5, dir basics
HTML5, native user interfaces
^ Find another task
Setting direction on block elements
See also
For information about setting up the default direction for a whole page see
Setting up a right-to-left page
See also
Managing direction in form controls
Add the
dir
attribute to a block element to change base direction
more
Do not use CSS styling to control directionality in HTML. Use markup
more
Only use bidi markup to set the base direction for the document as a whole, or where you need to
change
the base direction
more
How to's
Text direction and structural markup in HTML
How to use the dir attribute and handle alignment. Includes:
Setting direction on block elements
How to use the
dir
attribute and handle alignment.
Working with tables
Particular advice for working with tables.
Handling content whose direction is not known in advance
Besides form-related information, how to insert text into a page with the right base direction, using HTML5 features.
Displaying bidi text in the textarea and pre elements
How
dir=auto
affects elements with multiple paragraphs of plain text.
Unicode controls vs. markup for bidi support
To correctly format bidi text in HTML or XML content, should I use Unicode control codes or markup?
CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based
markup languages?
Spec links
HTML5, 3.2.3.6 The
dir
attribute
Tests
HTML5, dir basics
HTML5, the pre element
^ Find another task
Managing text direction in form controls
See also
See also
Setting direction on block elements
Add
dir="auto"
to
input
tags to automatically align text to the correct side of an input field.
more
Add
dir="auto"
to
textarea
and
pre
tags to make paragraphs align to the left or right according to the intial strong character
more
Consider using the
dirname
attribute to pass information to the server about the direction of text in a text or search form control.
more
How to's
Text direction and structural markup in HTML
How should I use the dir attribute to set text direction on structural elements in HTML? Includes:
Correcting display of opposite-direction text in the
input
element
HTML5 techniques for getting the cursor and text to the right side of the
input
element.
Displaying bidi text in the
textarea
and
pre
elements
Using
dir=auto
in HTML5 to assign direction to each paragraph independently.
Reporting direction to the server
Using HTML5's
dirname
attribute to pass direction information to the server.
Setting direction on forms explicitly
Keystrokes that make browsers set the direction of form entry fields.
Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do?
Spec links
HTML5, 3.2.3.6 The
dir
attribute
HTML5, 4.10.7.3.2 The
dirname
attribute
Tests
HTML5, dir=auto & bdi
HTML5, the textarea element
HTML5, the dirname attribute
^ Find another task
Mixing text direction inline
See also
Related sections include
Handling parentheses and other mirrored characters
, and
Overriding the Unicode bidirectional algorithm
Tightly wrap every opposite-direction phrase in markup that sets its base direction.
more
If you know the phrase's direction
, wrap it in an element with a
dir
attribute. If you don't already have an element around the text, use
span
or
bdi
more
If you don't know the phrase's direction
, ie. unknown text that will be injected at run time, then either wrap the phrase in
bdi
(no
dir
attribute needed), or if the phrase is tightly wrapped by an element already, just add
dir="auto"
to that element.
more
To bulletproof the code for Edge or legacy browsers, if the tightly-wrapped phrase is followed inline (possibly after some intervening neutral characters) by a number, or is one of a list of separate phrases with the same direction, then add a directional mark (RLM or LRM) immediately after the markup of that phrase.
more
Only use Unicode control characters for bidirectional control in attribute text or element text that allows no internal markup
more
Consider using Unicode control characters to set the base direction around bidirectional text that will be displayed as tooltips, page titles, or on JavaScript dialog boxes.
more
Do not leave white space at the end of inline elements that mark a directional boundary.
more
How to's
Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:
Handling inline bidirectional text in HTML
Brief steps for marking up any type of inline bidirectional text. Following sections give worked examples.
What if I can't use markup?
Use Unicode control characters where markup isn't allowed.
Bidi space loss
Why does my browser collapse spaces between Latin and Arabic/Hebrew text?
CSS vs. markup for bidi support
Should I use CSS or markup to correctly format Unicode-based bidirectional (bidi) text in HTML and XML-based
markup languages?
Unicode controls vs. markup for bidi support
To correctly format bidi text in HTML or XML content, should I use Unicode control codes or markup?
Using Unicode controls for bidi text
If I'm unable to use markup to correctly order bidirectional text, what can I do?
RTL rendering of LTR scripts
Ways to produce runs of right-to-left text for languages such as Chinese, Japanese, Egyptian hieroglyphs, Tifinagh, Old Norse runes, and a good number of other now-archaic scripts.
Spec links
HTML5, 3.2.3.6 The
dir
attribute
HTML5, 4.6.23 The
bdi
element
Tests
HTML5, dir=auto & bdi
HTML5, the br element
^ Find another task
Handling parentheses and other mirrored characters
Treat mirrored characters as if any word
left
in the name meant '
opening
', and
right
meant '
closing
more
How to's
Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:
Mirrored characters
Understanding how parentheses and other mirroring characters work in bidirectional text.
^ Find another task
Overriding the Unicode bidirectional algorithm
Use the
bdo
element to force the directionality of a sequence of inline characters
more
How to's
Inline markup and bidirectional text in HTML
How to use markup in HTML4 and new HTML5 features for inline bidirectional text. Introduces you gently to the bidi algorithm, if you need that. Includes:
Overriding the bidi algorithm
How to disable the bidi algorithm, when needed.
Spec links
HTML5, 4.6.24 The
bdo
element
Tests
HTML5, the bdo element
^ Find another task
Getting started
Background reading
Quick tips: Navigation
One of the top 10 quick tips for internationalization.
Monolingual vs. multilingual web sites
What are the trade-offs between international sites that are monolingual vs. multilingual?
International & multilingual web sites
What is an "international" or a "multilingual" web site?
^ Find another task
Linking to localized content
See also
See also
Indicating the language of a link destination
in the Language section.
Use server-based, language-related content negotiation to point the user to the page that matches their browser preferences, but also add links to each page so that the user can change languages easily if they prefer.
more
Consider how to indicate to the user where the in-page language links are, and if the page is available in a long list of languages, consider whether or not to use something like a select control (and if so, how to make it obvious what its function is).
more
Locate pull-down menus or selection lists at or near the top of the page.
more
Use a recognizable image alongside a pull-down menu to indicate that it is a control which will take the user to localized pages. Do not use text.
more
Consider using the size attribute to display the first set of options in a select control.
more
Translate the links or options into the target language.
more
Encode your page as UTF-8, so that it supports the necessary characters.
more
Decide whether it is a problem that a user won't have fonts for all the list items or menu options. If it is, use javascript menus or some other graphic-based approach.
more
Decide whether to add a description alongside each option, using the language of the current page, so that users can tell what the native word means.
more
Find the most appropriate way of ordering the list of options.
more
How to's
Guiding users to translated pages
If my site contains alternative language versions of the same page, what can I do to help the user see the page in their preferred language?
Using