Architectural Principles of the World Wide Web
Comments by GK are indicated thus
Architectural Principles of the World Wide Web
W3C
Working Draft 30 August 2002
This version:
Latest version:
Editor:
Ian Jacobs, W3C
Authors:
See
acknowledgments
W3C
MIT
INRIA
Keio
), All Rights Reserved. W3C
liability
trademark
document use
, and
software licensing
rules apply.
Abstract
The World Wide Web is a networked information system. Web
Architecture is the set of principles that all agents in the system
follow to create the large-scale effect of a shared information
space. Identification, data formats, and protocols are the main
technical components of Web Architecture, but the large-scale
effect depends on social behavior as well.
This document strives to establish a reference set of principles
for Web architecture.
Status
of this document
This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. The latest status of this document series is maintained
at the W3C.
This is the first public Working Draft of "Architectural
Principles of the World Wide Web." This document has been developed
by W3C's
Technical Architecture Group
(TAG)
charter
).
This draft represents substantial input from TAG participants,
but does not yet represent consensus. It is also incomplete;
sections 1 and 2 are the most developed, 3 and 4 the least. The TAG
has published a number of
findings
that
address specific architecture issues. Parts of those findings may
appear in subsequent drafts. Please also consult the
list of
issues
under consideration by the TAG.
Publication as a Working Draft does not imply endorsement by the
W3C Membership. This is a draft document and may be updated,
replaced or obsoleted by other documents at any time. It is
inappropriate to cite this document as other than "work in
progress."
The latest information regarding
patent disclosures related to this
document
is available on the Web. As of this publication, there
are no disclosures.
Please send comments on this document to the public W3C TAG
mailing list
www-tag@w3.org
archive
).
list of current
W3C Recommendations and other technical documents
can be found
at the W3C Web site.
Table of Contents
1. Introduction
1.1. Structure and
conventions of this document
1.2. Audience of this
document
1.3. Limits of this
document
1.4. Summary of
principles
1.5. Summary of
good practice notes
2. Identifiers and
resources
2.1. Resources,
URIs, and the shared information space
2.2. Operations
on absolute URI references
2.3.
Persistence
2.4. URI
Schemes
2.5. Fragment
identifiers
2.6. Some
generalities about absolute URI references
3. Formats
3.1. Scope
3.2. Content, Presentation, and
Interaction
3.3. Ideas and
issues
4. Protocols
4.1. REST
constraints
4.2. Ideas and
issues
5. Glossary
6. References
6.1. Normative
References
6.2. Non-Normative
References
7. End
notes
8. Acknowledgments
1.
Introduction
The World Wide Web (or, Web) is a networked information system
consisting of
agents
programs acting on behalf of another person,
entity, or process
) that exchange information.
The term "agents" seems odd here - at this point in the document, why not just "...consisting of
programs
that exchange information"?
This architecture consists of:
"consists of" seems very closed, and makes the architecture seem rather tangible; maybe "employs the following building blocks"?
Identifiers
. A single specification to
identify objects in the system: the Uniform Resource Identifier
URI
) [
RFC2396
].
This seems to imply that RFC2396 actually does the identifying. Suggest: "A single form of identifier for objects ...".
Formats
. A nonexclusive set of data format specifications designed for interchange between agents in the system. This includes several data formats used in isolation
Is anything truly "in isolation" on the web? Suggest: "separately"
or in combination (e.g., XHTML, CSS, PNG, XLink, RDF, SMIL animation), as well as technologies for designing
"technologies for constructing"?
new data formats (XML, XML Namespaces).
Protocols
. A small and nonexclusive set of
protocol specifications for interchanging information between
agents, including HTTP [
RFC2616
], SMTP, and others. Several of these
protocols share a reliance on the
Internet Media Type
(or, "MIME")
the metadata/packaging system [
RFC2046
].
Wording of the final point above seems to confuse MIME content-types with the MIME encapsulation format. The phrase "Internet Media Type" is not one I'm aware is commonly used, though the intent is clear enough. I suggest mentioning the MIME content-types in the second point (e.g. "These data formats are identified using MIME content-type values [RFC2045]"). Then in the third point: "... share a reliance on the MIME
metadata/packaging system [RFC2045] [
RFC2046
". Also, note RFC2045 is usually cited as the primary MIME reference. RFC2046 describes some of the MIME media types and some common content types.
1.1.
Structure and conventions of this
document
After this introduction, sections two, three, and four discuss
identifiers, formats, and protocols, respectively. Each section
highlights
principles of Web
architecture
and
notes on good practice
. These
principles and good practice notes are summarized at the end of the
introduction.
The terms MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are used
in accordance with RFC 2119 [
RFC2119
].
Some issues and editorial notes are indicated.
1.2.
Audience of this document
The intended audience for this document includes:
Participants in W3C groups,
Other groups and individuals developing technologies to be
integrated into the Web.
The authors have made every effort to keep this document terse,
with the expectation that additional documents will elaborate on
the
principles
1.3.
Limits
of this document
This document focuses on architectural principles specific to or
fundamental to the Web. It does not address general principles of
design, which are also important to the success of the Web. Indeed,
behind many of the principles of Web Architecture lie these and
other principles such as minimal constraint (fewer rules makes the
system more flexible), modularity, minimum redundancy,
extensibility, simplicity, and robustness.
This document does not address architectural design goals
covered by targeted W3C specifications:
Internationalization; see W3C's
Internationalization
Activity
Accessibility; see W3C's
Web Accessibility
Initiative
Device independence; see W3C's
Device Independence
Activity
Some of these principles may conflict with current practice, and
so education and outreach will be required to improve on that
practice. Other principles may fill in gaps in published
specifications or may call attention to known weaknesses in those
specifications.
1.4.
Summary of principles
In the design of the Web, some design decisions, like the names
the

and

  • elements in HTML, or the choice of the
    colon character in URIs, are somewhat arbitrary; if ,
    , or
    had been chosen instead, the large-scale result would, most likely, have been the same. Other design choices are more critical
    "more fundamental"?
    ; these are the architectural principles of the Web:
    1. Use absolute URI references:
    All important resources SHOULD be identified by an absolute URI reference.
    Hmmm... doesn't this confuse the identifier/reference distinction that TimBL mentioned recently? [
    2. Absolute URI references are
    unambiguous:
    Each absolute URI reference unambiguously identifies one resource.
    Ditto... I see this is pretty pervasive; I won't mention it again.
    3. Describe resources:
    Owners of important resources (for example, Internet protocol parameters) SHOULD make available representations that describe the nature and purpose of those resources.
    Is this a principle or a good practice?
    4. Representation retrieval is
    safe:
    Agents do not incur obligations by retrieving a
    representation.
    5. Be aware of context-sensitivity in
    absolute URI references:
    Owners and users of absolute URI references SHOULD ensure that
    any context-sensitivity of these identifiers is appropriate.
    6. Use consistent
    representations:
    There is a strong expectation of consistency between the representations of a resource; to the extent possible, representations SHOULD be equivalent.
    Hmmm... I'm bothered by the term "equivalent" here, even as qualified. To the extent that any representation is a "projection" of the thing it represents, equivalent seems too strong. I'd suggest something like "about the same thing".
    7. Support persistence:
    Those who create and manage resources and their identifiers SHOULD design the identifiers in such a way as to ensure their persistence.
    Referring back to the description of the architecture in section 1, this doesn't seem to be a part of the "architecture". I'd see this as a good practice. Also, this begs what is meant here by "persistence" -- many web publishers will not have the means to ensure persistence, but may still be able to allocate identifiers in a way that avoids or discourages re-use for a different purpose. (I see this is touched upon later.)
    8. Avoid unnecessary new URI
    schemes:
    Authors of specifications SHOULD avoid introducing new URI
    schemes when existing schemes can be used to meet the goals of the
    specifications.
    9. Do not use unregistered URI
    schemes:
    Unregistered URI schemes MUST NOT be used on the public
    Internet.
    1.5.
    Summary of good practice notes
    This document suggests the following good practice:
    1. Do not rely on URI case
    insensitivity:
    It SHOULD NOT be assumed that URIs which differ only in character case can be used interchangeably.
    Isn't this a MUST NOT? Isn't this a principle?
    2. Be aware of content negotiation
    and fragment semantics:
    Authors SHOULD NOT use HTTP content negotiation for different
    media types that do not share the same fragment identifier
    semantics.
    2.
    Identifiers and resources
    I'd like to see TimBLs discussion of identifiers and references incorporated in this section
    The Web is a universe of resources.
    resource
    is
    defined by [
    RFC2396
    ] to be
    anything that has identity.
    Examples include documents,
    files, menu items, machines, and services, as well as people,
    organizations, and concepts. Web architecture starts with a uniform
    syntax for resource identifiers, so that we can refer to resources,
    access them, describe them, and share them. The Uniform Resource
    Identifier (URI) syntax employs an extensible set of
    URI schemes
    . Several URI
    schemes incorporate into this syntax some identification mechanisms
    that pre-date the Web:
    MAILTO URIs identify mailbox names:
    mailto:nobody@example.org
    FTP URIs identify identify ftp file and directory names:
    ftp://example.org/aDirectory/aFile
    NEWS URIs identify newsgroup names:
    news:comp.infosystems.www
    TEL URIs identify telephone numbers:
    tel:+1-816-555-1212
    URN UUID URIs identify Universal Unique Identifiers:
    urn:uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882
    Other URI schemes have been introduced since the advent of the
    Web, including those introduced as a consequence of new protocols.
    Examples of URIs for these schemes include:
    ldap://ldap.itd.umich.edu/c=GB?objectClass?one
    urn:oasis:SAML:1.0
    One can append a fragment identifier to a URI to yield an
    identifier for part of, or a view of, a resource:
    ftp://example.org/aDirectory/aDocument#section1
    Note that while this composition is syntactically fully general,
    it is meaningless in some URI schemes. The absolute URI reference
    mailto:nobody@example.org#abc
    is meaningless in
    practice.
    The syntax of URIs and absolute URI references is defined in [
    RFC2396
    ]. In brief:
    Uniform Resource Identifier
    , or URI, is
    a character
    sequence starting with a scheme name, followed by a number of
    scheme-specific fields.
    An
    absolute URI reference
    is
    a URI followed
    optionally by a
    fragment
    identifier
    URIs and absolute URI references identify Web resources. The
    principles in this document are expressed in terms of absolute URI
    references.
    Note: The current URI specification [
    RFC2396
    ] defines a URI reference to be either
    an absolute URI reference or a relative URI reference. The syntax
    for a relative URI reference is a shortened form of that for an
    absolute URI reference, where some prefix of the URI is missing and
    certain path components ("." and "..") have a special meaning when,
    and only when, interpreting a relative path. For example, in a
    document whose base URI is
    , the relative URI
    reference
    ../file2
    is a shortened form of
    and the relative URI
    reference
    #abc
    is a shortened form for
    Editor's note
    : While people agree
    that URIs identify resources (per [
    RFC2396
    ]), there is not yet consensus that
    absolute URI references with fragment identifies may be used to
    identify resources. Some people contend that an absolute URI
    reference with a fragment identifier identifies a portion of a
    representation.
    2.1.
    Resources, URIs, and the shared information
    space
    When one
    resource refers to another via an absolute URI reference, a
    link
    is formed.
    When many
    resources are linked this way, the large-scale effect is a shared
    information space, addressable by absolute URI reference. The value
    of the Web increases with the number of resources addressable by
    absolute URI reference. In turn, resources are more valuable when
    they are addressable in the Web. Hence:
    Use absolute URI references:
    All important
    resources SHOULD be identified by an absolute URI reference.
    There are many benefits to making resources addressable by
    absolute URI reference. Some are by design (e.g., linking and
    bookmarking), while others are serendipitous (e.g., global search
    services). See the TAG finding
    URIs, Addressability,
    and the use of HTTP GET
    for some details about the interaction
    of this principle in HTTP application design.
    2.2.
    Operations on absolute URI
    references
    The two primary operations on absolute URI references are:
    Comparison of identifiers
    Interaction with resources
    2.2.1.
    Comparison of identifiers
    There may be applications (e.g., XML namespace names [
    XMLNS
    ]) where comparison is expected
    to be the sole or primary operation on an absolute URI reference.
    Certain URI schemes provide rules for determining the syntactic
    equivalence of absolute URI references, i.e., whether two absolute
    URI references are different spellings of the same identifier.
    These rules vary from scheme to scheme.
    For example, URNs begin with two colon-delimited fields, the
    first of which is the string
    urn
    and the second
    identifies the subclass of URN, for example
    urn:ietf:example
    . In URNs, these two fields are to be
    compared in a case-insensitive fashion. The remainder of the URN
    following the second colon is subject to rules dependent on the
    content of the second field (following the first colon) - thus the
    equivalence rules may vary within URN namespace identifiers.
    Section 3.2.3 of the HTTP specification [
    RFC2616
    ] states that, when comparing two HTTP
    URIs, the host name part must be considered case-insensitive, so
    and
    identify the same resource.
    Good practice note.
    Do not rely on URI case
    insensitivity:
    It SHOULD NOT be assumed that URIs which differ only in character case can be used interchangeably.
    MUST NOT?
    Note: Equivalence of URIs is not the same as
    consistent representations
    of a resource.
    Issue
    URIEquivalence-15
    When are two URI variants considered equivalent?
    2.2.2.
    Interactions with resources
    To
    dereference
    an absolute URI reference is
    to interact with the resource it identifies.
    One
    interacts with a resource by the exchange of
    representations
    of resource
    state.
    A resource is an abstraction for which there is a
    conceptual mapping to a (possibly empty) set of representations.
    Representations, when transferred by a Web
    protocol
    , are often accompanied by metadata,
    usually based on [
    RFC2046
    ].
    RFC2046 defines some specific MIME content types: do you mean metadata in this limited sense, or the more general sense of (say) Content-language? I think RFC2045 may be a more appropriate citation here.
    In particular, the value of the media type metadata value is key to the correct interpretation of a resource representation, and entirely governs the handling of fragment identifiers.
    The term "media type" is sometimes used to indicate only part of the MIME content type; e.g. the "text" of "text/plain"; I suggest "content type".
    For instance, suppose the URI
    identifies
    a resource that is "the weather forecast for Oaxaca, Mexico". A
    representation retrieved by means of that URI may be encoded in any
    number of formats, including HTML, XHTML, and SVG; see
    section 2
    for more information
    about formats.
    Interaction with a resource is governed by successive
    application of a finite set of specifications, beginning with the
    specification that governs the
    scheme of the URI
    . For example, suppose the
    absolute URI reference for the weather forecast is used within an
    element of an SVG document. The sequence of
    specifications applied is:
    The URI specification [
    RFC2396
    ]. This specification says (in section
    3.1) that the scheme "define the semantics for the remainder of the
    URI string." In this case, the URI scheme is HTTP.
    The HTTP/1.1 protocol. Section 3.2.2 of RFC2616 [
    RFC2616
    ] explains the semantics of
    HTTP URIs.
    The SVG 1.0 Recommendation [
    SVG10
    ], which imports the link semantics defined
    by XLink 1.0 [
    XLink10
    ]. Section
    17.4 of the SVG specification suggests that interaction with an
    link involves retrieving a representation a
    resource, identified by the XLink
    href
    attribute: "By
    activating these links (by clicking with the mouse, through
    keyboard input, and voice commands), users may visit these
    resources." This means that the GET method defined in HTTP/1.1 is
    used to retrieve the representation of the resource.
    Once the representation has been retrieved, the media type of
    the representation governs its interpretation (here, for
    rendering).
    It is important for the correct functioning of the Web that the
    mapping between URIs and resources be unambiguous.
    Absolute URI references are unambiguous:
    Each absolute URI reference unambiguously identifies one
    resource.
    There may be several ways to interact with a resource. One of
    the most important operations for the Web is to
    retrieve a representation
    of a resource
    (such as with HTTP GET), which means
    to retrieve an
    snapshot of a state of the resource.
    There are other ways to
    interact with a resource (such as with HTTP POST). Dereference
    mechanisms vary by
    URI
    scheme
    . For instance, the URN scheme [
    RFC 2141
    ] does not guarantee that a dereference
    procedure is defined for any given URN.
    Agents should be able to dereference absolute URI references for
    important resources.
    Describe resources:
    Owners of
    important resources (for example, Internet protocol parameters)
    SHOULD make available representations that describe the nature and
    purpose of those resources.
    Issue
    namespaceDocument-8
    What should a "namespace document" look like?
    Representation retrieval is safe:
    Agents do
    not incur obligations by retrieving a representation.
    For instance, a user does not incur an obligation by following
    an HTML link that causes the user agent to retrieve a
    representation.
    Note: See the TAG finding "
    URIs, Addressability,
    and the use of HTTP GET"
    for more information about safe
    retrieval.
    Issue
    deepLinking-25
    What to say in defense of principle that deep linking is not an
    illegal act?
    Editor's note
    : Need to say something
    about difference between assertions about a resource and assertions
    about a representation. E.g., do not use the same URI to refer to
    the resource "Moby Dick" and to the particular representation of
    that resource, or do not use the same URI to refer to a person and
    to that person's mailbox.
    2.2.3.
    Choice of URI or absolute URI
    reference
    When comparison is expected to be the sole or primary operation
    on an absolute URI reference, it does not matter whether one has
    chosen a URI or an absolute URI reference to identify a
    resource.
    When one expects to interact with a resource, there are some
    advantages to identifying that resource with a URI rather than an
    absolute URI reference: only URIs work with intermediaries in the
    Web architecture (e.g., proxies) or with redirection (in HTTP, for
    example).
    2.2.4.
    Absolute URI references and
    context-sensitivity
    Each absolute URI reference unambiguously identifies one
    resource, but the resource itself may be defined in a
    context-sensitive manner. For resources of this type, the result of
    a dereference operation may vary by context. Thus,
    may unambiguously
    identify "the nearest pizza restaurant", but the result of a
    retrieval operation may vary (e.g., it may change with the
    geographical position of the retrieving agent). Similarly,
    and
    file:/etc/hosts
    each identify one resource, but that resource is local to a
    particular computer, so dereference results will vary.
    Context-sensitive absolute URI references can be useful (e.g.,
    when one needs to find pizza or talk about host names in Unix
    environments). However, on the public Internet, an identifier such
    as
    file:/etc/hosts
    is a poor choice for the generic
    resource "host information" because, in many contexts (i.e., most
    non-Unix operating systems), host information is not maintained in
    a file named
    /etc/hosts
    Be aware of context-sensitivity in absolute URI
    references:
    Owners and users of absolute URI references SHOULD
    ensure that any context-sensitivity of these identifiers is
    appropriate.
    2.2.5.
    Consistent representations
    The representations of a resource may vary as a function of
    factors including time, the identity of the agent accessing the
    resource, data submitted to the resource when interacting with it,
    and changes external to the resource. For example, for the resource
    "the weather forecast for Oaxaca, Mexico," the representations
    depend on (at least) time, the expressed preference of the user for
    Fahrenheit or Celsius, the identity of the user-agent software
    receiving the representation, and, presumably, the weather in
    Oaxaca.
    Use consistent representations:
    There is
    a strong expectation of consistency between the representations of
    a resource; to the extent possible, representations SHOULD be
    equivalent.
    Editor's note
    : Need to clarify what "equivalent" means in the previous sentence.
    Yes! See note above.
    2.3.
    Persistence
    There is a difference between changes in representations of a
    resource and changes in the binding between an absolute URI
    reference and a resource. The absolute URI reference
    identifies the resource "the W3C
    home page." A representation retrieved today for that absolute URI
    reference is likely to differ from one you get tomorrow, since W3C
    updates its home page frequently with news items. Though the news
    changes, the resource remains "the W3C home page".
    On the other hand, if tomorrow, the same absolute URI reference
    identified a different resource (for example, because the domain
    was sold and the new owner decided to assert a different
    URI-Resource relationship), the identifier would lose value. This
    type of indiscriminate reuse of identifiers undermines their value
    and interferes with people who relied on them.
    There
    are strong social expectations that once an absolute URI reference
    identifies a particular resource, it should continue indefinitely
    to refer to that resource; this is called the
    persistence
    of the absolute URI
    reference.
    Persistence is always a matter of policy and
    commitment on the part of authorities assigning URIs rather than a
    constraint imposed by technological means.
    Support persistence:
    Those who create
    and manage resources and their identifiers SHOULD design the
    identifiers in such a way as to ensure their persistence.
    For example, each W3C technical report (e.g., "the SVG
    specification") is in fact a series of documents that mature over
    time (from Working Drafts, Candidate Recommendations, Proposed
    Recommendations, to Recommendation). W3C assigns an absolute URI
    reference to the "latest version" in the series (e.g.,
    ). W3C also assigns an
    absolute URI reference for each specification in the series (called
    the "this version URI", e.g.,
    ). W3C
    policy is that representations of the "latest version" resource
    will change over time (with each new publication of an SVG
    specification). W3C policy is also that representations of a
    specification designated by a "this version" identifier will not
    change over time, to the best of W3C's ability to maintain its
    archives intact.
    For more discussion about persistence, refer to [
    Cool
    ].
    2.4.
    URI
    Schemes
    One
    important characteristic of a
    URI
    is its
    scheme
    (the string that precedes the first colon in a
    URI).
    For example the scheme of the URI
    is "http", and for
    ftp://ftp.example.com/
    it is "ftp". It is common to
    classify URIs by scheme, calling the two preceding examples
    respectively an "HTTP URI" and an "FTP URI".
    Correct processing of URIs is often scheme-dependent, and since
    a huge range of software is expected to be able to process URIs,
    the cost of introduction of new URI schemes is very high.
    Avoid unnecessary new URI schemes:
    Authors of specifications SHOULD avoid introducing new URI schemes
    when existing schemes can be used to meet the goals of the
    specifications.
    While "myscheme:blort" is a URI that satisfies the syntactic
    constraints of [
    RFC2396
    ], if
    "myscheme" is not registered, you don't have license to use that
    URI in any Internet protocols; there aren't any valid uses of it.
    You can't expect anybody to know what you mean by it, and you
    aren't guaranteed that somebody else isn't already using it for
    something else.
    Do not use unregistered URI schemes:
    Unregistered URI schemes MUST NOT be used on the public
    Internet.
    The IANA registry [
    IANASchemes
    ] lists URI schemes and the
    specifications that define them. For instance, the HTTP URI scheme
    is defined in section 3.2.2 of the HTTP specification [
    RFC2616
    ]. Refer to
    RFC2717
    for information about registering a new
    URI scheme.
    The deployment and use of different URI schemes may require
    varying degrees of central coordination and administration. For
    example, MAILTO, FTP, and HTTP URIs depend (in practice at least)
    on the use of the DNS infrastructure. Also, there is a central
    registry of URN subclasses.
    URN subclasses are referred to as "namespaces". They are identified by namespace identifiers, or NIDs [RFC2141] [RFC2611].
    Issue
    httpRange-14
    : What is the range of HTTP URIs? Some URI schemes are used to
    identify specific classes of resources. Two views held within the
    TAG are that the range of HTTP URIs is (1) anything or (2)
    "documents," used in a very broad sense.
    2.5.
    Fragment
    identifiers
    In some URI schemes it is meaningful for an absolute URI
    reference to end with a fragment identifier. The fragment
    identifier is interpreted only
    after
    the retrieval
    of a representation. Section 4.1 of [
    RFC2396
    ] states that "the format and
    interpretation of fragment identifiers is dependent on the media
    type [RFC2046] of the retrieval result," that is, the
    representation.
    For instance, if the representation is an HTML document, the
    fragment identifies a hypertext anchor. In the case of a graphics
    format, the fragment might identify a circle or spline. In the
    Resource Description Framework [
    RDF10
    ], fragments can be used to identify
    anything, be it abstract (e.g., a dream) or concrete (e.g., an
    automobile).
    Good practice note.
    Be aware of content
    negotiation and fragment semantics:
    Authors SHOULD NOT use HTTP
    content negotiation for different media types that do not share the
    same fragment identifier semantics.
    Editor's note
    : There has been some
    discussion but no agreement that new access protocols should
    provide a means to convert fragment identifiers according to media
    type.
    2.6.
    Some generalities about absolute URI
    references
    The following generalities about absolute URI references are
    included to answer some frequently asked questions about URIs. Some
    of these generalities do not hold for all
    URI schemes
    The authority over an absolute URI reference determines which
    resource it identifies.
    It is not possible to inspect an absolute URI reference and
    determine what resource it identifies. For example, in general, one
    cannot look at
    and know
    that it refers to "my old car" or "the weather forecast for
    Oaxaca."
    Over time, we trust that some absolute URI references will
    identify familiar resources, but that trust derives from social
    behavior, not the spelling of the identifier.
    Several different absolute URI references can identify the same
    resource.
    It is possible to compare two absolute URI references to see
    whether they are spelled equivalently; see the section on
    comparison of
    identifiers
    for more details.
    It is not possible to inspect two absolute URI references that
    are spelled differently and determine whether they identify the
    same resource. This does not prevent some URI schemes from
    mandating equivalence for particular sets of URIs using that
    scheme.
    It is not possible to inspect an absolute URI reference and
    know the media type of representation(s) of that resource. For
    example, do not assume that an absolute URI reference that ends
    with the string ".html" refers to a resource that has an HTML
    representation. Of course, resource owners should not publish
    absolute URI references likely to cause confusion.
    3.
    Formats
    I assume this section is incopmplete.
    3.1.
    Scope
    What is a format, and how does it relate to the concept of a
    document. Do all documents have a format? Is a document a
    collection of resources of different formats organized into a
    whole? Is a document the same as a resource? the same as a message
    body? as a non-multipart message body? What is the distinction
    between documents and data, if any. Does 'document' imply human
    readable and if so, does it imply presentation? Does it imply a
    hierarchically structured, report-like document with headings and
    subheadings? Is a catalog a document? Is a rave flyer a
    document?
    Negotiation (stuff above might go here also) by network request, by listed alternatives in content any preference? Resource variants, foo.css and foo.html unlikely to be equivalent.
    3.2.
    Content, Presentation,
    and Interaction
    This section attempts to organize some areas of future
    discussion. Separating the concepts content, presentation, and
    interaction allows more easily composable specifications. For
    example, a markup language can be specified independently of a
    style sheet language. The separation facilitates alternate
    presentations of the same content, which is seen to have an
    accessibility advantage and to be more suited to the multiple
    modalities of Web access.
    Issue
    contentPresentation-26
    Separation of semantic and presentational markup, to the extent
    possible, is architecturally sound.
    3.2.1.
    Content
    Composability (ns-meaning). Use of XML for tree structured
    content. Linking in general v. idref in one document. Human
    readable v. machine data. Served or not (hidden behind server -
    semantic firewall, accessibility. Linking into parts of the
    content, transclusion of parts. Compound documents, components from
    multiple servers - scalability, deep linking. Processing models,
    error handling.
    3.2.2.
    Presentation
    Presentation by decoration (application of CSS to XML as
    presentation), and by derivation (creation of html/svg/etc as
    presentation). Linking (bidirectionally) between content and
    presentations. Inheritance of properties across namespaces.
    Consistency of property names. Subsets. 'Applies to' as opposed to
    'set on'. Specificity of properties as attributes, chaining
    styling, restyling. Time-lines, linking to portions of a
    time-line.
    3.2.3.
    Interaction
    Animation, scripting, events, client/server interaction.
    Declarative v. script based - accessibility, power; formalization
    of common functionality (loop animation, rollovers) in declarative
    form. DOM - making additional methods, add to rather than replacing
    XML DOM. Effect of script/programming language limitations on
    choice of element and attribute names. Linking to active components
    - XForms example with model and abstract form control, can be
    extended to presentational instantiation of form control.
    3.3.
    Ideas and issues
    For new format specifications, use XML family of specifications
    unless there's a good reason not to. Which XML specifications?
    Which particular family members?
    Format designers should use URIs without constraining content
    providers to particular URI schemes. What does "use" mean? IDREF v.
    linking - web-wide rather than document-wide references.
    Namespaces. Issues
    namespaceDocument-8
    mixedNamespaceMeaning-13
    Qnames: Issues
    rdfmsQnameUriMapping-6
    qnameAsId-18
    and finding "
    Using QNames as
    Identifiers in Content
    Formatting properties: Issue
    formattingProperties-19
    contentPresentation-26
    Error handling: Issue
    errorHandling-20
    Media type registration:
    RFC3023Charset-21
    finding
    Internet Media
    Type registration, consistency of use
    . Also, makes sure to
    define fragment identifier semantics.
    Effect of Mobile on architecture - size, complexity, memory
    constraints. Binary infosets, storage efficiency. Composable
    subsets.
    What is the scope of using XLink?
    xlinkScope-23
    Can a specification include rules for overriding HTTP content
    type parameters?
    contentTypeOverride-24
    Create formats that allow authors to hide URIs from view (e.g.,
    behind link text). For authors: at times it is useful or necessary
    to reveal a URI (e.g., in an advertisement on the side of a bus),
    in which case, good social behavior requires that the URI be easy
    to use.
    4.
    Protocols
    As mentioned in the introduction, the Web is designed to create
    the large-scale effect of a shared information space that scales
    well and behaves predictably.
    The architectural style known as
    Representational State Transfer
    REST
    ] encapsulates this notion of a
    shared information space.
    According to Fielding:
    REST provides a set of architectural constraints that, when
    applied as a whole, emphasizes scalability of component
    interactions, generality of interfaces, independent deployment of
    components, and intermediary components to reduce interaction
    latency, enforce security, and encapsulate legacy systems.
    -- Roy Fielding, Section 5.5 of [
    REST]
    HTTP has been specially designed for REST interactions. HTTP
    offers a variety of ways to
    interact with a resource
    including GET, POST, PUT, and DELETE.
    The following sections use the REST model to explain how Web
    protocols take into account the properties of resources and URIs,
    as well as real-world time and space constraints, in order to
    improve the user's Web experience.
    4.1.
    REST constraints
    The REST constraints are:
    Client/server model
    REST separates rendering concerns from the data model and
    control logic.
    Stateless protocols
    Each request from client to server contains all the necessary
    data for a server to understand the request.
    Caching
    Some representations may be cached. Intermediaries may respond
    on behalf of a server with the cached data.
    Uniform Interface
    The consistent constraints on interface between components,
    specifically resource identification, resource manipulation through
    representations, self-describing messages, and messages as the
    embodiment of application state.
    Layering
    The encapsulation of each component so that components "know"
    only about the components with which they are interacting.
    Optional Code-on-demand
    Clients may download and execute code (such as Java Applets,
    ActiveX controls, scripts, and XSLT).
    REST focuses on the roles of components, the constraints upon
    their interaction with other components, and their interpretation
    of significant data elements. REST ignores the details of component
    implementation and protocol syntax. REST components communicate by
    transferring a
    representation of a resource
    selected dynamically based on the capabilities or desires of the
    recipient and the nature of the resource. Whether the
    representation is in the same format as the raw source, or is
    derived from the source, remains hidden behind the interface.
    Typical hypertext systems support one of three possible styles
    of data representation:
    render the data where it is located and send a fixed-format
    image to the recipient,
    encapsulate the data with a rendering engine and send both to
    the recipient, or
    send the raw data to the recipient along with metadata that
    describes the data type, so that the recipient can choose their own
    rendering engine.
    The Web provides a hybrid of all three options by focusing on a
    shared understanding of data types with metadata, but limiting the
    scope of what is revealed to a standardized interface.
    Web components perform various roles in interactions. User
    agents, gateways, proxies, and origin servers are the main roles
    that a component can act in. A component may act in different roles
    depending upon the interaction.
    4.2.
    Ideas and issues
    Consistency of media types and message contents (from "
    TAG
    Finding: Internet Media Type registration, consistency of
    use
    Consistency of communicating character encoding (same
    source).
    HTTP as a substrate protocol [
    TAG issue
    HTTPSubstrate-16
    5.
    Glossary
    Absolute URI Reference
    a URI followed optionally by a fragment identifier.
    Agents
    programs acting on behalf of another person, entity, or
    process
    Dereference
    To dereference an absolute URI reference is to interact with the resource it identifies.
    This definition seems incomplete: I think the interaction returns an representation and does not change the state of the resource
    Internet Media Type
    the metadata/packaging system [RFC2046].
    As mentioned above, this seems odd use of terminology. Also, the main referebce for the "system" is RFC 2045.
    Link
    When one resource refers to another via an absolute URI
    reference, a link is formed.
    Persistence
    There are strong social expectations that once an absolute URI reference identifies a particular resource, it should continue indefinitely to refer to that resource; this is called the persistence of the absolute URI reference.
    I tend to view persistence of a URI as avoiding its reuse for a different purpose. This somehow seems more realistically achievable.
    REST
    The architectural style known as Representational State Transfer [REST] encapsulates this
    "this"?
    notion of a shared information space.
    Representation
    One interacts with a resource by the exchange of representations of resource state.
    This doesn't read as a glossary entry for "representation". I'd expect something like: "a data object that represents or describes a state of a resource".
    Resource
    A resource is defined by [RFC2396]
    (missing hyperlink?)
    to be anything that has identity.
    Retrieve a representation
    to retrieve an snapshot
    "an snapshot"?
    of a state of the resource.
    Suggest something like: "An interaction with a resource that returnbs a representation of its state".
    URI Scheme
    One important characteristic of a URI is its scheme (the string
    that precedes the first colon in a URI).
    Uniform Resource Identifier
    (URI)
    a character sequence starting with a scheme name, followed by a
    number of scheme-specific fields.
    6.
    References
    6.1.
    Normative
    References
    IANASchemes
    IANA's
    online registry
    of URI Schemes
    is available at
    Dan
    Connolly's list of URI schemes
    is a useful resource for finding
    out which references define various URI schemes.
    RFC2046
    IETF "
    RFC 2046: Multipurpose
    Internet Mail Extensions (MIME) Part Two: Media Types
    ", N.
    Freed, N. Borenstein, November 1996. Available at
    RFC2119
    IETF "
    RFC 2119: Key words for
    use in RFCs to Indicate Requirement Levels
    ", S. Bradner, March
    1997. Available at http://www.ietf.org/rfc/rfc2119.txt.
    RFC2396
    IETF "
    RFC 2396: Uniform
    Resource Identifiers (URI): Generic Syntax
    ", T. Berners-Lee, R.
    Fielding, L. Masinter, August 1998. Available at
    RFC2616
    IETF "
    RFC 2616: Hypertext
    Transfer Protocol -- HTTP/1.1
    ", J. Gettys, J. Mogul, H.
    Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999.
    Available at http://www.ietf.org/rfc/rfc2616.txt.
    RFC2717
    IETF "
    Registration Procedures
    for URL Scheme Names
    ", R. Petke, I. King, November 1999.
    Available at http://www.ietf.org/rfc/rfc2717.txt.
    6.2.
    Non-Normative References
    Axioms
    Universal Resource
    Identifiers - Axioms of Web Architecture
    ", T. Berners-Lee,
    living document dated December 1996. Available at
    Cool
    Cool URI's don't
    change
    " T. Berners-Lee, W3C, 1998 Available at
    CSS2
    Cascading Style
    Sheets, level 2
    ", B. Bos, H. Lie, C. Lilley, I. Jacobs, 12 May
    1998. This W3C Recommendation is available at
    Eng90
    Knowledge-Domain
    Interoperability and an Open Hyperdocument System
    ", D. C.
    Engelbart, June 1990.
    Fielding
    Principled
    Design of the Modern Web Architecture
    ", R.T. Fielding and R.N.
    Taylor, UC Irvine.In Proceedings of the 2000 International
    Conference on Software Engineering (ICSE 2000), Limerick, Ireland,
    June 2000, pp. 407-416. This document is available at
    Fragments
    Fragment Identifiers
    on URIs
    ", T. Berners-Lee, living document dated April 1997.
    Available at http://www.w3.org/DesignIssues/Fragment
    HTML40
    HTML 4.01
    Specification
    ", D. Raggett, A. Le Hors, I. Jacobs, 24 December
    1999. This W3C Recommendation is available at
    P3P10
    The Platform for
    Privacy Preferences 1.0 (P3P1.0) Specification
    ", M. Marchiori,
    ed., 16 April 2002. This W3C Recommendation is available at
    RDF10
    Resource
    Description Framework (RDF) Model and Syntax Specification
    ", O.
    Lassila, R. R. Swick, eds., 22 February 1999. This W3C
    Recommendation is available at
    REST
    Representational State Transfer (REST)
    ", Chapter 5 of
    "Architectural Styles and the Design of Network-based Software
    Architectures", Doctoral Thesis of R. T. Fielding, 2000.
    RFC2141
    IETF "
    RFC 2141: URN
    Syntax
    ", R. Moats, May 1997. Available at
    RFC2718
    Guidelines for new URL
    Schemes
    ", L. Masinter, H. Alvestrand, D. Zigmond, R. Petke,
    November 1999. Available at:
    RFC3236
    IETF "
    RFC 3236: The
    'application/xhtml+xml' Media Type
    ", M. Baker, P. Stark,
    January 2002. Available at:
    SVG10
    Scalable Vector
    Graphics (SVG) 1.0 Specification
    ", J. Ferraiolo, ed., 4 Sep
    2001. This W3C Recommendation is available at
    UniqueDNS
    IAB Technical Comment on the Unique DNS Root"
    , B. Carpenter, 27
    Sep 1999.
    XHTML10
    XHTML 1.0:
    The Extensible HyperText Markup Language: A Reformulation of HTML 4
    in XML 1.0
    ", S. Pemberton et al., 26 January 2000. The latest
    version of this W3C Recommendation is available at
    XLink10
    XML Linking
    Language (XLink) Version 1.0
    ", S. DeRose, E. Maler, D. Orchard,
    27 June 2001. This W3C Recommendation is available at
    XML10
    Extensible Markup
    Language (XML) 1.0 (Second Edition)
    ", T. Bray, J. Paoli, C.M.
    Sperberg-McQueen, E. Maler, 6 October 2000. This W3C Recommendation
    is available at http://www.w3.org/TR/2000/REC-xml-20001006.
    XMLNS
    Namespaces
    in XML
    ", T. Bray, D. Hollander, A. Layman, 14 Jan 1999. This
    W3C Recommendation is available at
    W3CPROCESS
    W3C Process Document
    ", 19 July 2001 Version.
    7.
    End notes
    This principle
    dates back at least as far as Douglas Engelbart's seminal work on
    open hypertext systems; see section
    Every
    Object Addressable
    in [
    Eng90
    ]. (
    Note 1
    context.
    The
    title is somewhat misleading. It's not the URIs that change, it's
    what they identify. (
    Note 2 context.
    8.
    Acknowledgments
    The authors of this document are the participants of W3C's
    Technical Architecture Group: Tim Berners-Lee (Chair, W3C), Tim
    Bray (Antarctica Systems), Dan Connolly (W3C), Paul Cotton
    (Microsoft), Roy Fielding (Day Software), Chris Lilley (W3C), David
    Orchard (BEA Systems), Norman Walsh (Sun), and Stuart Williams
    (Hewlett-Packard).
    The TAG thanks people for their thoughtful contributions on the
    TAG's public mailing list, www-tag (
    archive
    ).