HTML Standard

HTML Standard
Introduction
1.1
Where does this specification fit?
1.2
Is this HTML5?
1.3
Background
1.4
Audience
1.5
Scope
1.6
History
1.7
Design notes
1.7.1
Serializability of script execution
1.7.2
Extensibility
1.8
HTML vs XML syntax
1.9
Structure of this specification
1.9.1
How to read this specification
1.9.2
Typographic conventions
1.10
A quick introduction to HTML
1.10.1
Writing secure applications with HTML
1.10.2
Common pitfalls to avoid when using the scripting APIs
1.10.3
How to catch mistakes when writing HTML: validators and conformance checkers
1.11
Conformance requirements for authors
1.11.1
Presentational markup
1.11.2
Syntax errors
1.11.3
Restrictions on content models and on attribute values
1.12
Suggested reading
Introduction
1.1
Where does this specification fit?
This specification defines a big part of the web platform, in lots of detail. Its place in the
web platform specification stack relative to other specifications can be best summed up as
follows:
1.2
Is this HTML5?
This section is non-normative.
In short: Yes.
In more length: the term "HTML5" is widely used as a buzzword to refer to modern web
technologies, many of which (though by no means all) are developed at the WHATWG. This document is
one such; others are available from
the WHATWG Standards
overview
1.3
Background
This section is non-normative.
HTML is the World Wide Web's core markup language. Originally, HTML was primarily designed as a
language for semantically describing scientific documents. Its general design, however, has
enabled it to be adapted, over the subsequent years, to describe a number of other types of
documents and even applications.
1.4
Audience
This section is non-normative.
This specification is intended for authors of documents and scripts that use the features
defined in this specification, implementers of tools that operate on pages that
use the features defined in this specification, and individuals wishing to establish the
correctness of documents or implementations with respect to the requirements of this
specification.
This document is probably not suited to readers who do not already have at least a passing
familiarity with web technologies, as in places it sacrifices clarity for precision, and brevity
for completeness. More approachable tutorials and authoring guides can provide a gentler
introduction to the topic.
In particular, familiarity with the basics of DOM is necessary for a complete understanding of
some of the more technical parts of this specification. An understanding of Web IDL, HTTP, XML,
Unicode, character encodings, JavaScript, and CSS will also be helpful in places but is not
essential.
1.5
Scope
This section is non-normative.
This specification is limited to providing a semantic-level markup language and associated
semantic-level scripting APIs for authoring accessible pages on the web ranging from static
documents to dynamic applications.
The scope of this specification does not include providing mechanisms for media-specific
customization of presentation (although default rendering rules for web browsers are included at
the end of this specification, and several mechanisms for hooking into CSS are provided as part of
the language).
The scope of this specification is not to describe an entire operating system. In particular,
hardware configuration software, image manipulation tools, and applications that users would be
expected to use with high-end workstations on a daily basis are out of scope. In terms of
applications, this specification is targeted specifically at applications that would be expected
to be used by users on an occasional basis, or regularly but from disparate locations, with low
CPU requirements. Examples of such applications include online purchasing systems, searching
systems, games (especially multiplayer online games), public telephone books or address books,
communications software (email clients, instant messaging clients, discussion software), document
editing software, etc.
1.6
History
This section is non-normative.
For its first five years (1990-1995), HTML went through a number of revisions and experienced a
number of extensions, primarily hosted first at CERN, and then at the IETF.
With the creation of the W3C, HTML's development changed venue again. A first abortive attempt
at extending HTML in 1995 known as HTML 3.0 then made way to a more pragmatic approach known as
HTML 3.2, which was completed in 1997. HTML4 quickly followed later that same year.
The following year, the W3C membership decided to stop evolving HTML and instead begin work on
an XML-based equivalent, called XHTML. This
effort started with a reformulation of HTML4 in XML, known as XHTML 1.0, which added no new
features except the new serialization, and which was completed in 2000. After XHTML 1.0, the W3C's
focus turned to making it easier for other working groups to extend XHTML, under the banner of
XHTML Modularization. In parallel with this, the W3C also worked on a new language that was not
compatible with the earlier HTML and XHTML languages, calling it XHTML2.
Around the time that HTML's evolution was stopped in 1998, parts of the API for HTML developed
by browser vendors were specified and published under the name DOM Level 1 (in 1998) and DOM Level
2 Core and DOM Level 2 HTML (starting in 2000 and culminating in 2003). These efforts then petered
out, with some DOM Level 3 specifications published in 2004 but the working group being closed
before all the Level 3 drafts were completed.
In 2003, the publication of XForms, a technology which was positioned as the next generation of
web forms, sparked a renewed interest in evolving HTML itself, rather than finding replacements
for it. This interest was borne from the realization that XML's deployment as a web technology was
limited to entirely new technologies (like RSS and later Atom), rather than as a replacement for
existing deployed technologies (like HTML).
A proof of concept to show that it was possible to extend HTML4's forms to provide many of the
features that XForms 1.0 introduced, without requiring browsers to implement rendering engines
that were incompatible with existing HTML web pages, was the first result of this renewed
interest. At this early stage, while the draft was already publicly available, and input was
already being solicited from all sources, the specification was only under Opera Software's
copyright.
The idea that HTML's evolution should be reopened was tested at a W3C workshop in 2004, where
some of the principles that underlie the HTML5 work (described below), as well as the
aforementioned early draft proposal covering just forms-related features, were presented to the
W3C jointly by Mozilla and Opera. The proposal was rejected on the grounds that the proposal
conflicted with the previously chosen direction for the web's evolution; the W3C staff and
membership voted to continue developing XML-based replacements instead.
Shortly thereafter, Apple, Mozilla, and Opera jointly announced their intent to continue
working on the effort under the umbrella of a new venue called the WHATWG. A public mailing list
was created, and the draft was moved to the WHATWG site. The copyright was subsequently amended to
be jointly owned by all three vendors, and to allow reuse of the specification.
The WHATWG was based on several core principles, in particular that technologies need to be
backwards compatible, that specifications and implementations need to match even if this means
changing the specification rather than the implementations, and that specifications need to be
detailed enough that implementations can achieve complete interoperability without
reverse-engineering each other.
The latter requirement in particular required that the scope of the HTML5 specification include
what had previously been specified in three separate documents: HTML4, XHTML1, and DOM2 HTML. It
also meant including significantly more detail than had previously been considered the norm.
In 2006, the W3C indicated an interest to participate in the development of HTML5 after all,
and in 2007 formed a working group chartered to work with the WHATWG on the development of the
HTML5 specification. Apple, Mozilla, and Opera allowed the W3C to publish the specification under
the W3C copyright, while keeping a version with the less restrictive license on the WHATWG
site.
For a number of years, both groups then worked together. In 2011, however, the groups came to
the conclusion that they had different goals: the W3C wanted to publish a "finished" version of
"HTML5", while the WHATWG wanted to continue working on a Living Standard for HTML, continuously
maintaining the specification rather than freezing it in a state with known problems, and adding
new features as needed to evolve the platform.
In 2019, the WHATWG and W3C
signed an
agreement
to collaborate on a single version of HTML going forward: this document.
1.7
Design notes
This section is non-normative.
It must be admitted that many aspects of HTML appear at first glance to be nonsensical and
inconsistent.
HTML, its supporting DOM APIs, as well as many of its supporting technologies, have been
developed over a period of several decades by a wide array of people with different priorities
who, in many cases, did not know of each other's existence.
Features have thus arisen from many sources, and have not always been designed in especially
consistent ways. Furthermore, because of the unique characteristics of the web, implementation
bugs have often become de-facto, and now de-jure, standards, as content is often unintentionally
written in ways that rely on them before they can be fixed.
Despite all this, efforts have been made to adhere to certain design goals. These are described
in the next few subsections.
1.7.1
Serializability of script execution
This section is non-normative.
To avoid exposing web authors to the complexities of multithreading, the HTML and DOM APIs are
designed such that no script can ever detect the simultaneous execution of other scripts. Even
with
workers
, the intent is that the behavior of implementations can
be thought of as completely serializing the execution of all scripts in all globals.
The exception to this general design principle is the JavaScript
SharedArrayBuffer
class. Using
SharedArrayBuffer
objects, it can in fact be observed that scripts in
other
agents
are executing simultaneously. Furthermore, due to the
JavaScript memory model, there are situations which not only are un-representable via serialized
script
execution, but also un-representable via serialized
statement
execution
among those scripts.
1.7.2
Extensibility
This section is non-normative.
HTML has a wide array of extensibility mechanisms that can be used for adding semantics in a
safe manner:
Authors can use the
class
attribute to extend elements,
effectively creating their own elements, while using the most applicable existing "real" HTML
element, so that browsers and other tools that don't know of the extension can still support it
somewhat well. This is the tack used by microformats, for example.
Authors can include data for inline client-side scripts or server-side site-wide scripts
to process using the
data-*=""
attributes. These are guaranteed
to never be touched by browsers, and allow scripts to include data on HTML elements that scripts
can then look for and process.
Authors can use the

mechanism to
include page-wide metadata.
Authors can use the
rel=""
mechanism to annotate
links with specific meanings by registering
extensions to
the predefined set of link types
. This is also used by microformats.
Authors can embed raw data using the