RDFa 1.1 Primer - Third Edition
RDFa 1.1 Primer - Third Edition
Rich Structured Data Markup for Web Documents
W3C
Working Group Note
17 March 2015
This version:
Latest published version:
Latest editor's draft:
Previous version:
Editors:
Ivan Herman
W3C
ivan@w3.org
Ben Adida
Creative Commons
ben@adida.net
Manu Sporny
Digital Bazaar
msporny@digitalbazaar.com
Mark Birbeck
, webBackPlane.com,
mark.birbeck@webBackplane.com
Please check the
errata
for any errors or issues
reported since publication.
This document is also available in this non-normative format:
diff to previous version
2010-2015
W3C
MIT
ERCIM
Keio
Beihang
).
W3C
liability
trademark
and
document use
rules apply.
Abstract
The last couple of years have witnessed a fascinating evolution: while the Web was initially
built predominantly for human consumption, web content is increasingly consumed by machines
which expect some amount of structured data. Sites have started to identify a page's title,
content type, and preview image to provide appropriate information in a user's newsfeed when
she clicks the "Like" button. Search engines have started to provide richer search results by
extracting fine-grained structured details from the Web pages they crawl. In turn, web
publishers are producing increasing amounts of structured data within their Web content to
improve their standing with search engines.
A key enabling technology behind these developments is the ability to add structured data to
HTML pages directly. RDFa (Resource Description Framework in Attributes) is a technique that
allows just that: it provides a set of markup attributes to augment the visual information on
the Web with machine-readable hints. In this Primer, we show how to express data using RDFa
in HTML, and in particular how to mark up existing human-readable Web page content to express
machine-readable data.
This document provides only a Primer to RDFa 1.1. The complete specification of RDFa, with
further examples, can be found in the RDFa 1.1 Core [
rdfa-core
], RDFa Lite [
rdfa-lite
],
XHTML+RDFa 1.1 [
xhtml-rdfa
], and the HTML5+RDFa 1.1 [
html-rdfa
] specifications.
Status of This Document
This section describes the status of this document at the time of its publication.
Other documents may supersede this document. A list of current
W3C
publications and the
latest revision of this technical report can be found in the
W3C
technical reports index
at
This document was published by the
RDFa Working Group
as a Working Group Note.

If you wish to make comments regarding this document, please send them to
public-rdfa@w3.org
archives
).

All comments are welcome.
Publication as a Working Group Note does not imply endorsement by the
W3C
Membership. This is a draft document and may be updated, replaced or obsoleted by other
documents at any time. It is inappropriate to cite this document as other than work in
progress.
This document was produced by a group operating under the
5 February 2004
W3C
Patent
Policy
W3C
maintains a
public list of any patent
disclosures
made in connection with the deliverables of the group; that page also includes
instructions for disclosing a patent. An individual who has actual knowledge of a patent
which the individual believes contains
Essential
Claim(s)
must disclose the information in accordance with
section
6 of the
W3C
Patent Policy
This document is governed by the
14 October 2005
W3C
Process Document
Table of Contents
1.
Introduction
1.1
HTML vs. XHTML
1.2
Validation
2.
Using RDFa
2.1
The Basics of RDFa: RDFa Lite
2.1.1
The First Steps: Adding Machine-Readable Hints to Web Pages
2.1.1.1
Hints on Social Networking Sites
2.1.1.2
Links with Flavor
2.1.1.3
Setting a Default Vocabulary
2.1.1.4
Multiple Items per Page
2.1.2
Exploring Further: Social networks
2.1.2.1
Contact Information
2.1.2.2
Describing Social Networks
2.1.3
Repeated Patterns
2.1.4
Internal References
2.1.5
Using Multiple Vocabularies
2.1.5.1
Repeating properties
2.1.5.2
Default Prefixes (Initial Context)
2.2
Going Deeper: RDFa Core
2.2.1
Using the
content
attribute
2.2.2
Datatypes
2.2.3
Alternative for setting the context:
about
2.2.4
Alternative for setting the property:
rel
3.
You Said Something about RDF?
3.1
Custom Vocabularies
4.
RDFa Tools
5.
Acknowledgments
A.
References
A.1
Informative references
1.
Introduction
The web is a rich, distributed repository of interconnected information. Until recently, it
was organized primarily for human consumption. On a typical web page, an HTML author might
specify a headline, then a smaller sub-headline, a block of italicized text, a few paragraphs
of average-size text, and, finally, a few single-word links. Web browsers will follow these
presentation instructions faithfully. However, only the human mind understands what the
headline expresses-a blog post title. The sub-headline indicates the author, the italicized
text is the article's publication date, and the single-word links are subject categories.
Computers do not understand the nuances between the information; the gap between what
programs and humans understand is large.
Figure 1
: On the left, what browsers see. On the right, what
humans see. Can we bridge the gap so that browsers see more of what we see?
Fig.
presentation vs. semantics
What if the browser, or any machine consumer such as a Web crawler, received information on
the meaning of a web page's visual elements? A dinner party announced on a blog could be
copied to the user's calendar, an author's complete contact information to the user's address
book. Users could automatically recall previously browsed articles according to
categorization labels (i.e., tags). A photo copied and pasted from a web site to a school
report would carry with it a link back to the photographer, giving him proper credit. A link
shared by a user to his social network contacts would automatically carry additional data
pulled from the original web page: a thumbnail, an author, and a specific title. When web
data meant for humans is augmented with hints meant for computer programs, these programs
become significantly more helpful, because they begin to understand the data's structure.
RDFa allows HTML authors to do just that. Using a few simple HTML attributes, authors can
mark up human-readable data with machine-readable indicators for browsers and other programs
to interpret. A web page can include markup for items as simple as the title of an article,
or as complex as a user's complete social network.
1.1
HTML vs. XHTML
Historically, RDFa 1.0 [
rdfa-syntax
] was specified only for XHTML. RDFa 1.1 [
rdfa-core
is the newer version and the one used in this document. RDFa 1.1 is
specified for both XHTML [
xhtml-rdfa
] and HTML5 [
html-rdfa
]. In fact, RDFa 1.1 also
works for any XML-based languages like SVG [
svg11
]. This document uses HTML in all of
the examples; for simplicity, we use the term "HTML" throughout this document to refer to
all of the HTML-family languages.
1.2
Validation
RDFa is based on attributes. While some of the HTML attributes (e.g.,
href
src
) have been re-used, other RDFa attributes are new. This is important
because some of the (X)HTML validators may not properly validate the HTML code until they
are updated to recognize the new RDFa attributes. This is rarely a problem in practice
since browsers simply ignore attributes that they do not recognize. None of the
RDFa-specific attributes have any effect on the visual display of the HTML content.
Authors do not have to worry about pages marked up with RDFa looking any different to a
human being from pages not marked up with RDFa.
2.
Using RDFa
2.1
The Basics of RDFa: RDFa Lite
We begin the introduction to RDFa by using a subset of all the possibilities called RDFa
Lite 1.1 [
rdfa-lite
]. The goal, when defining that subset, was to define a set of
possibilities that can be applied to most simple to moderate structured data markup
tasks, without burdening the authors with additional complexities. Many Web authors will
not need to use more than this minimal subset.
2.1.1
The First Steps: Adding Machine-Readable Hints to Web Pages
Consider Alice, a blogger who publishes a mix of professional and personal articles
at
. We will construct markup examples to
illustrate how Alice can use RDFa. A more complete markup of these examples is
available
on a
dedicated page
2.1.1.1
Hints on Social Networking Sites
Alice publishes a blog and would like to provide extra structural information on
her pages like the publication date or the title. She would like to use the terms
defined in the Dublin Core vocabulary [
dc11
], a set of terms that are widely
used by, for example, the publishing industry or libraries. Her blog already
contain that information:
Example 1


...


...

The Trouble with Bob


Date: 2011-09-10


...

This information is, however, aimed at humans only; computers need some
sophisticated methods to extract it. But, using RDFa, she can annotate her
page to make the
structured data
clear:
Example 2


...


...
property="http://purl.org/dc/terms/title"
>The Trouble with Bob

Date: property="http://purl.org/dc/terms/created"
>2011-09-10


...

(Notice the markup colored in red: these are the RDFa "hints".)
One useful way to visualize the structured data is:
Figure 2
: A visualization of the structured data
for a blog post with a title of "The Trouble with Bob" and a creation date.
Fig.
relationship value is text
It is worth emphasizing that RDFa uses URLs to identify just about everything.
This is why, instead of just using properties like
title
or
created
, we use
and
. The reason behind this design
decision is rooted in data portability, consistency, and information sharing.
Using URLs removes the possibility for ambiguities in terminology. Without
ensuring that there is no ambiguity, the term "title" might mean "the title of a
work", "a job title", or "the deed for real-estate property". When each
vocabulary term is a URL, a detailed explanation for the vocabulary term is just
one click away. It allows anything, humans or machines, to follow the link to
find out what a particular vocabulary term means. By using a URL to identify a
particular creation time, for example
, both humans and machines can
understand that the URL unambiguously refers to the "Date of creating the
resource", such as a web page.
By using URLs as identifiers, RDFa provides a solid way of disambiguating
vocabulary terms. It becomes trivial to determine whether or not vocabulary terms
used in different documents mean the same thing. If the URLs are the same, the
vocabulary terms mean the same thing. It also becomes very easy to create new
vocabulary terms and vocabulary documents. If one can publish a document to the
Web, one automatically has the power to create a new vocabulary document
containing new vocabulary terms.
2.1.1.2
Links with Flavor
The previous example demonstrated how Alice can markup text to make it machine
readable. She would also like to mark up the links in a machine-readable way, to
express the type of link being described. RDFa lets the publisher add a "flavor",
i.e., a label, to an existing clickable link that processors can understand. This
makes the same markup help both humans and machines.
In her blog's footer, Alice already declares her content to be freely reusable,
as long as she receives due credit when her articles are cited. The HTML includes
a link to a Creative Commons [
cc-about
] license:
Example 3

All content on this site is licensed under

a Creative Commons License
. ©2011 Alice Birpemswick.


A human clearly understands this sentence, in particular the
meaning
of
the link with respect to the current document: it indicates the document's
license, the conditions under which the page's contents are distributed.
Unfortunately, when Bob visits Alice's blog, his browser sees only a plain link
that could just as well point to one of Alice's friends or to her CV. For Bob's
browser to understand that this link actually points to the document's licensing
terms, Alice needs to add some
flavor
, some indication of what
kind
of link this is.
She can add this flavor using again the
property
attribute. Indeed,
when the element contains the
href
(or
src
) attribute,
property
is automatically associated with the value of this
attribute rather than the textual content of the
element. The
value of the attribute is the
defined by the
Creative Commons
Example 4

All content on this site is licensed under
property="http://creativecommons.org/ns#license"
href="http://creativecommons.org/licenses/by/3.0/">
a Creative Commons License. ©2011 Alice Birpemswick.


With this small update, Bob's browser will now understand that this link has a
flavor: it indicates the blog's license:
Figure 3
: A link with flavor: the link indicates
the web page's license. We can represent web pages as nodes, the link as an
arrow connecting those nodes, and the link's flavor as the label on that
arrow.
Fig.
two Web pages connected by a link labeled 'license' and two notes with a 'license' relationship
Alice is quite pleased that she was able to add only structured-data hints via
RDFa, never having to repeat the content of her text or the URL of her clickable
links.
2.1.1.3
Setting a Default Vocabulary
In a number of simple use cases, such as our example with Alice's blog, HTML
authors will predominantly use a single vocabulary. However, while generating
full URLs via a CMS system is not a particular problem, typing these by hand may
be error prone and tedious for humans. To alleviate this problem RDFa introduces
the
vocab
attribute to let the author declare a single vocabulary
for a chunk of HTML. Thus, instead of:
Example 5


...


...
property="http://purl.org/dc/terms/title"
>The Trouble with Bob

Date: property="http://purl.org/dc/terms/created"
>2011-09-10


...

Alice can write:
Example 6


...

vocab="http://purl.org/dc/terms/"
...
property="title"
>The Trouble with Bob

Date: property="created"
>2011-09-10


...

Note how the property values are single "terms" now; these are simply
concatenated to the URL defined via the
vocab
attribute. The
attribute can be placed on
any
HTML element (i.e., not only on the
body
element like in the example) and its effect is valid for all
the elements below that point.
Default vocabularies and full URIs can be mixed at any time. I.e., Alice could
have written:
Example 7


...

vocab="http://purl.org/dc/terms/"
...
property="title"
>The Trouble with Bob

Date: property="http://purl.org/dc/terms/created"
>2011-09-10


...

Perhaps a more interesting example is the combination of the header with the
licensing segment of her web page:
Example 8


...

vocab="http://purl.org/dc/terms/"
...
property="title"
>The Trouble with Bob

Date: property="created"
>2011-09-10


...

All content on this site is licensed under
property="http://creativecommons.org/ns#license"
href="http://creativecommons.org/licenses/by/3.0/">
a Creative Commons License. ©2011 Alice Birpemswick.




The full URL for the license term is necessary to avoid mixing vocabularies. As
an alternative, Alice could have also chosen to use the
vocab
attribute again:
Example 9


...

vocab="http://purl.org/dc/terms/"
...
property="title"
>The Trouble with Bob

Date: property="created"
>2011-09-10


...
vocab="http://creativecommons.org/ns#"
>All content on this site is licensed under
property="license"
href="http://creativecommons.org/licenses/by/3.0/">
a Creative Commons License. ©2011 Alice Birpemswick.




because the
vocab
in the license paragraph overrides the definition
inherited from the body of the document.
Note
The
vocab
attribute references structured data vocabularies, identified using URLs.
RDFa does not limit the form of these URLs or the document formats accessible by de-referencing them;
however users
SHOULD
aim to use widely shared, conventional values for identifying such vocabularies,
following conventions of case, spelling etc. established by their publishers.
2.1.1.4
Multiple Items per Page
Alice's blog page may contain, of course, multiple entries. Sometimes, Alice's
sister Eve guest blogs, too. The front page of the blog lists the 10 most recent
entries, each with its own title, author, and introductory paragraph. How, then,
should Alice mark up the title of each of these entries individually even though
they all appear within the same web page? RDFa provides
resource
, an
attribute for specifying the "context", i.e., the exact URL to which the
contained RDFa markup applies:
Example 10
vocab="http://purl.org/dc/terms/"
...
resource="/alice/posts/trouble_with_bob"
property="title"
>The trouble with Bob

Date: property="created"
>2011-09-10


property="creator"
>Alice
...