XHTML™ Basic 1.1 - Second Edition
XHTML
Basic 1.1 - Second Edition
W3C Proposed Edited Recommendation 7 October 2010
This version:
Latest version:
Previous version:
Diff-marked from previous version:
xhtml-basic-rec-diff.html
Editor:
Shane McCarron
Applied Testing and Technology, Inc.
shane@aptest.com
Version 1.1 Editors:
Shane McCarron
, Applied Testing and Technology, Inc.
Masayasu Ishikawa
, (until March 2007 while at W3C)
Version 1.0 Editors:
Mark Baker
, Sun Microsystems
Masayasu Ishikawa
, (until March 2007 while at W3C)
Shinichi Matsui
, Panasonic
Peter Stark
, Ericsson
Ted Wugofski
, Openwave Systems
Toshihiko Yamakami
, ACCESS
Co.
Ltd.
Please refer to the
errata
for this document, which may include some normative corrections. See
also
translations
This document is also available in these non-normative formats:
PostScript version
PDF
version
ZIP archive
, and
Gzip'd TAR archive
W3C
MIT
ERCIM
Keio
), All Rights Reserved. W3C
liability
trademark
and
document use
rules apply.
Abstract
The
XHTML
Basic document type includes the minimal set of modules required to be an
XHTML
host language document type, and in addition it includes images, forms, basic tables, and object support. It is designed for Web clients that do not support the full set of
XHTML
features; for example, Web clients such as mobile phones,
PDA
s, pagers, and
set top boxes. The document type is rich enough for content authoring.
XHTML
Basic is designed as a common base that may be extended. The goal of
XHTML
Basic is to serve as a common language supported by various kinds of user agents.
This revision, 1.1 Second Edition, supercedes version 1.1 as defined in
. In this
revision, an XML Schema implementation and the
lang
attribute have been added. In the update from version 1.0 to version 1.1, several new features were incorporated into the language in
order to better serve the small-device community that is this language's major user:
XHTML Forms (defined in [
XHTMLMOD
])
Intrinsic Events (defined in [
XHTMLMOD
])
The value attribute for the
li
element (defined in [
XHTMLMOD
])
The target attribute (defined in [
XHTMLMOD
])
The style element (defined in [
XHTMLMOD
])
The style attribute (defined in [
XHTMLMOD
])
XHTML Presentation module (defined in [
XHTMLMOD
])
The inputmode attribute (defined in
Section 5
of this document)
The document type definition is implemented using
XHTML
modules as defined in "
XHTML
Modularization
" [
XHTMLMOD
].
Status of this Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of
this technical report can be found in the
W3C technical reports index
at http://www.w3.org/TR/.
This document is a W3C Proposed Edited Recommendation. If approved, it will supersede the
previous edition
of XHTML Basic. The only
changes in this Proposed Edited Recommendation are to add an XML Schema implementation of the markup language and integrate the
lang
attribute to increase compatibility with User Agents
and Assistive Technologies. A version that shows the specific changes from the previous Recommendation is available in
diff-marked form
Publication as a Proposed Edited Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
It is inappropriate to cite this document as other than work in progress.
W3C Advisory Committee Members are invited to send formal review comments on this Proposed Edited Recommendation to the W3C Team until 11 November 2010. Members of the W3C Advisory Committee will
find the appropriate review form for this document by consulting their
list of current WBS questionnaires
This document has been produced by the
W3C
XHTML2
Working Group
as part of the
W3C
HTML
Activity
. Please see the Working Group's
implementation report
Please send comments about this document to
www-html-editor@w3.org
archive
). It is
inappropriate to send discussion email to this address. Public discussion may take place on
www-html@w3.org
archive
).
This document was produced by a group operating under the
5 February 2004 W3C Patent Policy
. W3C maintains a
public list of any patent disclosures
made in connection with the deliverables of the group; that page also includes instructions for
disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains
Essential
Claim(s)
must disclose the information in accordance with
section 6 of the W3C Patent Policy
Table of Contents
1.
Introduction
1.1.
XHTML
for Small Information Appliances
1.2.
Background and Requirements
1.3.
Design Rationale
2.
Conformance
2.1.
Document Conformance
2.2.
User Agent Conformance
3.
The
XHTML
Basic Document Type
4.
How to Use
XHTML
Basic
5.
XHTML inputmode module
5.1.
inputmode Attribute Value Syntax
5.2.
User Agent Behavior
5.3.
List of Tokens
5.4.
Relationship to XML Schema pattern facets
5.5.
Examples
6.
Acknowledgements
A.
References
A.1.
Normative References
A.2.
Informative References
B.
XHTML
Basic Document Type Definition
B.1.
SGML
Open Catalog Entry for
XHTML
Basic
B.2.
XHTML
Basic Driver
B.3.
XHTML
Basic Customizations
C.
XHTML
Basic XML Schema Definition
C.1.
XHTML
Basic XML Schema Driver
C.2.
XHTML
Basic Schema Modules
C.3.
XHTML
Basic Customizations
1. Introduction
1.1.
XHTML
for Small Information Appliances
HTML
4 is a powerful language for authoring Web content, but its design does not take into consideration issues pertinent to small devices,
including the implementation cost (in power, memory,
etc.
) of the full feature set. Consumer devices with limited resources cannot generally afford to
implement the full feature set of
HTML
4. Requiring a full-fledged computer for access to the World Wide Web excludes a large portion of the
population from consumer device access of online information and services.
Because there are many ways to subset
HTML
, there are many almost identical subsets defined by organizations and companies. Without a common
base set of features, developing applications for a wide range of Web clients is difficult.
The motivation for
XHTML
Basic is to provide an
XHTML
document type
that can be shared across communities (
e.g.
desktop,
TV
, and mobile phones), and that is rich enough to
be used for simple content authoring. New community-wide document types can be defined by extending
XHTML
Basic in such a way that
XHTML
Basic documents are in the set of valid documents of the new document type. Thus an
XHTML
Basic document can be presented on the maximum number of Web clients.
The document type definition for
XHTML
Basic is implemented based on the
XHTML
modules defined in
XHTML
Modularization [
XHTMLMOD
].
For information on best practices for mobile content, we refer you to [
MOBILEBP
].
1.2. Background and Requirements
Information appliances are targeted for particular uses. They support the features they need for the functions they are designed to fulfill. The following are examples of different information
appliances:
Mobile phones
Televisions
PDA
Vending machines
Pagers
Car navigation systems
Mobile game machines
Digital book readers
Smart watches
Existing subsets and variants of
HTML
for these clients include Compact
HTML
CHTML
], the Wireless Markup Language [
WML
], and the "
HTML
4.0 Guidelines for Mobile Access" [
GUIDELINES
]. The common features found in these document types include:
Basic text (including headings, paragraphs, and lists)
Hyperlinks and links to related documents
Basic forms
Basic tables
Images
Meta information
This set of
HTML
features has been the starting point for the design of
XHTML
Basic. Since many content developers are familiar with these
HTML
features, they comprise a useful host language that may be combined with markup
modules from other languages according to the methods described in "
XHTML
Modularization
" [
XHTMLMOD
]. For example,
XHTML
Basic may be extended with a custom module to support richer markup semantics in
specific environments.
It is not the intention of
XHTML
Basic to limit the functionality of future languages. But since the features in
HTML
4 (frames, advanced tables,
etc.
) were developed for a desktop computer type of client, they have proved to be inappropriate
for many non-desktop devices.
XHTML
Basic will be extended and built upon. Extending
XHTML
from a common and basic set of features, instead of almost identical subsets or the too-large set of functions in
HTML
4, will be good for interoperability on the Web, as well as for scalability.
Compared to the rich functionality of
HTML
4,
XHTML
Basic may look like one
step back, but in fact, it is two steps forward for clients that do not need what is in
HTML
4 and for content developers who get one
XHTML
subset instead of many.
1.3. Design Rationale
This section explains why certain
HTML
features are not part of
XHTML
Basic.
1.3.1. Presentation
Many simple Web clients cannot display fonts other than monospace. Bi-directional text, bold faced font, and other text extension elements are not supported.
It is recommended that style sheets be used to create a presentation that is appropriate for the device.
1.3.2. Tables
Basic
XHTML
tables
([
XHTMLMOD
], section 5.6.1) are supported, but tables can be difficult to display on small devices. It is recommended that content developers follow the Web Content Accessibility
Guidelines 1.0 for
creating accessible tables
([
WCAG10
], Guideline 5). Note that in
the Basic Tables Module, nesting of tables is prohibited.
1.3.3. Frames
Frames are not supported. Frames depend on a screen interface and may not be applicable to some small appliances like phones, pagers, and watches.
2. Conformance
This section is
normative.
2.1. Document Conformance
A Conforming
XHTML
Basic document is a document that requires only the facilities described as mandatory in this specification. Such
a document must meet all of the following criteria:
The document must conform to the constraints expressed in
Appendix B
and
Appendix C
The root element of the document must be

The name of the default namespace on the root element must be the
XHTML
namespace name,
The start tag MAY also contain the declaration of the XML Schema Instance Namespace and an XML Schema Instance
schemaLocation
attribute [
XMLSCHEMA
]. Such an attribute would associate the XHTML namespace
with the XML Schema at the URI
There must be a DOCTYPE declaration in the document prior to the root element. If present, the public identifier included in the DOCTYPE declaration must reference the
DTD
found in
Appendix B
using its Formal Public Identifier. The system identifier may be modified appropriately.
"http://www.w3.org/TR/xhtml-basic/xhtml-basic11.dtd">
The
DTD
subset must not be used to override any parameter entities in the
DTD
XHTML Basic 1.1 documents SHOULD be labeled with the Internet Media Type "application/xhtml+xml" as defined in [
RFC3236
]. For further information on using
media types with XHTML, see the informative note [
XHTMLMIME
].
2.2. User Agent Conformance
The user agent must conform to the "
User Agent Conformance
" section of the
XHTML
1.0 specification ([
XHTML1
], section 3.2).
3. The
XHTML
Basic Document Type
This section is
normative
The
XHTML
Basic document type is defined as a set of
XHTML
modules.
All
XHTML
modules are defined in the "
XHTML
Modularization
" specification [
XHTMLMOD
].
XHTML
Basic consists of the following
XHTML
modules:
Structure Module*
body, head, html, title
Text Module*
abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var
Hypertext Module*
List Module*
dl, dt, dd, ol, ul, li
Forms Module
button, fieldset, form, input, label, legend, select, optgroup, option, textarea
Basic Tables Module
caption, table, td, th, tr
Image Module
img
Object Module
object, param
Presentation module
b, big, hr, i, small, sub, sup, tt
Metainformation Module
meta
Link Module
link
Base Module
base
Intrinsic Events module
Events attributes
Scripting module
script
and
noscript
elements
Stylesheet module
style
element
Style Attribute Module
Deprecated
style
attribute
Target Module
target
attribute.
Note:
The target attribute is designed to be a general hook for binding to an external environment (such as Frames, multiple windows, browser-tabbed windows); when there is no such external environment
bound to the user agent, the user agent can ignore the target attribute. When there is an external environment bound, the conformance requirements for the target attribute are defined in each
environment.
The content author needs to be aware that the user agent behavior for the target attribute depends on multiple factors such as the existence of an environment binding, restrictions of available
resources, existence of other applications and user preferences (such as pop-up blockers), and implementation-dependent design decisions. When there is no external environmental conformance, it is
recommended that authors do not depend on use of the target attribute.
It should be noted that any implementation-dependent use of the target attribute might impede interoperability.
This specification also adds the
lang
attribute to the I18N attribute collection as defined in
XHTMLMOD
. The
lang
attribute is defined in
HTML4
. When this attribute and the
xml:lang
attribute are specified on the same element, the
xml:lang
attribute takes precedence. When both
lang
and
xml:lang
are specified on the same element, they SHOULD have the same value.
(*) = This module is a required
XHTML Host Language
module.
XHTML Basic also uses the
XHTML inputmode Attribute Module
, as defined in this specification. This module adds the
inputmode
attribute to the
input
and
textarea
elements of the XHTML Forms Module.
Finally, XHTML Basic adds the
value
attribute to the
li
element of the XHTML List Module.
An
XML
1.0
DTD
is available in
Appendix B.
An
XML
Schema implementation is available in
Appendix C.
4. How to Use
XHTML
Basic
Although
XHTML
Basic can be used as it is - a simple
XHTML
language
with text, links, and images - the intention of its simple design is for use as a host language. A host language can contain a mix of vocabularies all rolled into one document type. It is natural
that
XHTML
is the host language, since that is what most Web developers are used to.
When markup from other languages is added to
XHTML
Basic, the resulting document type will be an extension of
XHTML
Basic. Content developers can develop for
XHTML
Basic or take advantage of the
extensions. The goal of
XHTML
Basic is to serve as a common language supported by various kinds of user agents.
5. XHTML inputmode Attribute Module
This section is
normative
This section was originally a component of
XForms 1.0
, and was written by Martin Duerst.
The inputmode Attribute Module defines the
inputmode
attribute.
inputmode = CDATA
This attribute specifies style information for the current element.
The following table shows additional attributes for elements defined elsewhere when the inputmode module is selected.
Elements
Attributes
Notes
input&
inputmode (
CDATA
When the Basic Forms or Forms Module is selected.
textarea&
inputmode (
CDATA
When the Basic Forms or Forms Module is selected.
The attribute
inputmode
provides a
hint
to the user agent to select an appropriate input mode for the text input expected in an associated form control. The input mode may be
a keyboard configuration, an input method editor (also called front end processor) or any other setting affecting input on the device(s) used.
Using
inputmode
, the author can give hints to the agent that make form input easier for the user. Authors should provide
inputmode
attributes wherever possible, making
sure that the values used cover a wide range of devices.
5.1
inputmode
Attribute Value Syntax
The value of the
inputmode
attribute is a white space separated list of tokens. Tokens are either sequences of alphabetic letters or absolute URIs. The later can be distinguished from
the former by noting that absolute URIs contain a ':'. Tokens are case-sensitive. All the tokens consisting of alphabetic letters only are defined in this specification, in
5.3 List of Tokens
(or a successor of this specification).
This specification does not define any URIs for use as tokens, but allows others to define such URIs for extensibility. This may become necessary for devices with input modes that cannot be
covered by the tokens provided here. The URI should dereference to a human-readable description of the input mode associated with the use of the URI as a token. This description should describe the
input mode indicated by this token, and whether and how this token modifies other tokens or is modified by other tokens.
5.2 User Agent Behavior
Upon entering an empty form control with an
inputmode
attribute, the user agent should select the input mode indicated by the
inputmode
attribute value. User agents
should not use the
inputmode
attribute to set the input mode when entering a form control with text already present. To set the appropriate input mode when entering a form control that
already contains text, user agents should rely on platform-specific conventions.
User agents should make available all the input modes which are supported by the (operating) system/device(s) they run on/have access to, and which are installed for regular use by the user. This
is typically only a small subset of the input modes that can be described with the tokens defined here.
Note:
Additional guidelines for user agent implementation are found at
[UAAG 1.0]
The following simple algorithm is used to define how user agents match the values of an
inputmode
attribute to the input modes they can provide. This algorithm does not have to be
implemented directly; user agents just have to behave as if they used it. The algorithm is not designed to produce "obvious" or "desirable" results for every possible combination of tokens, but to
produce correct behavior for frequent token combinations and predictable behavior in all cases.
First, each of the input modes available is represented by one or more lists of tokens. An input mode may correspond to more than one list of tokens; as an example, on a system set up for a Greek
user, both "greek upperCase" and "user upperCase" would correspond to the same input mode. No two lists will be the same.
Second, the
inputmode
attribute is scanned from front to back. For each token
in the
inputmode
attribute, if in the remaining lists of tokens representing
available input modes there is any list of tokens that contains
, then all lists of tokens representing available input modes that do not contain
are removed. If there is no
remaining list of tokens that contains
, then
is ignored.
Third, if one or more lists of tokens are left, and they all correspond to the same input mode, then this input mode is chosen. If no list is left (meaning that there was none at the start) or if
the remaining lists correspond to more than one input mode, then no input mode is chosen.
Example: Assume the list of lists of tokens representing the available input modes is: {"cyrillic upperCase", "cyrillic lowerCase", "cyrillic", "latin", "user upperCase", "user lowerCase"}, then
the following
inputmode
values select the following input modes: "cyrillic title" selects "cyrillic", "cyrillic lowerCase" selects "cyrillic lowerCase", "lowerCase cyrillic" selects
"cyrillic lowerCase", "latin upperCase" selects "latin", but "upperCase latin" does select "cyrillic upperCase" or "user upperCase" if they correspond to the same input mode, and does not select any
input mode if "cyrillic upperCase" and "user upperCase" do not correspond to the same input mode.
5.3 List of Tokens
Tokens defined in this specification are separated into two categories:
Script tokens
and
modifiers
. In
inputmode
attributes, script tokens should always be listed
before modifiers.
5.3.1 Script Tokens
Script tokens provide a general indication the set of characters that is covered by an input mode. In most cases, script tokens correspond directly to
[Unicode
Scripts]
. Some tokens correspond to the block names in Java class java.lang.Character.UnicodeBlock (
[Java Unicode Blocks]
) or Unicode Block names. However,
this neither means that an input mode has to allow input for all the characters in the script or block, nor that an input mode is limited to only characters from that specific script. As an example,
a "latin" keyboard doesn't cover all the characters in the Latin script, and includes punctuation which is not assigned to the Latin script. The version of the Unicode Standard that these script
names are taken from is 3.2.
Input Mode Token
Comments
arabic
Unicode script name
armenian
Unicode script name
bengali
Unicode script name
bopomofo
Unicode script name
braille
used to input braille patterns (not to indicate a braille input device)
buhid
Unicode script name
canadianAboriginal
Unicode script name
cherokee
Unicode script name
cyrillic
Unicode script name
deseret
Unicode script name
devanagari
Unicode script name
ethiopic
Unicode script name
georgian
Unicode script name
greek
Unicode script name
gothic
Unicode script name
gujarati
Unicode script name
gurmukhi
Unicode script name
han
Unicode script name
hangul
Unicode script name
hanja
Subset of 'han' used in writing Korean
hanunoo
Unicode script name
hebrew
Unicode script name
hiragana
Unicode script name (may include other Japanese scripts produced by conversion from hiragana)
ipa
International Phonetic Alphabet
kanji
Subset of 'han' used in writing Japanese
kannada
Unicode script name
katakana
Unicode script name (full-width, not half-width)
khmer
Unicode script name
lao
Unicode script name
latin
Unicode script name
malayalam
Unicode script name
math
mathematical symbols and related characters
mongolian
Unicode script name
myanmar
Unicode script name
ogham
Unicode script name
oldItalic
Unico de script name
oriya
Unicode script name
runic
Unicode script name
simplifiedHanzi
Subset of 'han' used in writing Simplified Chinese
sinhala
Unicode script name
syriac
Unicode script name
tagalog
Unicode script name
tagbanwa
Unicode script name
tamil
Unicode script name
telugu
Unicode script name
thaana
Unicode script name
thai
Unicode script name
tibetan
Unicode script name
traditionalHanzi
Subset of 'han' used in writing Traditional Chinese
user
Special value denoting the 'native' input of the user (e.g. to input her name or text in her native language).
yi
Unicode script name
5.3.2 Modifier Tokens
Modifier tokens can be added to the scripts they apply in order to more closely specify the kind of characters expected in the form control. Traditional PC keyboards do not need most modifier
tokens (indeed, users on such devices would be quite confused if the software decided to change case on its own; CAPS lock for upperCase may be an exception). However, modifier tokens can be very
helpful to set input modes for small devices.
Input Mode Token
Comments
lowerCase
lowercase (for bicameral scripts)
upperCase
uppercase (for bicameral scripts)
titleCase
title case (for bicameral scripts): words start with an upper case letter
startUpper
start input with one uppercase letter, then continue with lowercase letters
digits
digits of a particular script (e.g. inputmode='thai digits')
symbols
symbols, punctuation (suitable for a particular script)
predictOn
text prediction switched on (e.g. for running text)
predictOff
text prediction switched off (e.g. for passwords)
halfWidth
half-width compatibility forms (e.g. Katakana; deprecated)
5.4 Relationship to XML Schema pattern facets
User agents may use information available in an XML Schema pattern facet to set the input mode. Note that a pattern facet is a hard restriction on the lexical value of an instance data node, and
can specify different restrictions for different parts of the data item. Attribute
inputmode
is a soft hint about the kinds of characters that the user may most probably start to input
into the form control. Attribute
inputmode
is provided in addition to pattern facets for the following reasons:
The set of allowable characters specified in a pattern may be so wide that it is not possible to deduce a reasonable input mode setting. Nevertheless, there frequently is a kind of characters that
will be input by the user with high probability. In such a case,
inputmode
allows to set the input mode for the user's convenience.
In some cases, it would be possible to derive the input mode setting from the pattern because the set of characters allowed in the pattern closely corresponds to a set of characters covered by an
inputmode
attribute value. However, such a derivation would require a lot of data and calculations on the user agent.
Small devices may leave the checking of patterns to the server, but will easily be able to switch to those input modes that they support. Being able to make data entry for the user easier is of
particular importance on small devices.
5.5 Examples
This is an example of a form for Japanese address input.
Family name:
(in kana):
Given name:
(in kana):
Postal code:
Address:
(in kana):
Email:
Telephone:
Comments: