microformats2 parsing specification - Microformats Wiki
microformats2 parsing specification
From Microformats Wiki
Jump to navigation
Jump to search
microformats2
is a simple, open format for marking up data in HTML. The microformats2 parsing specification describes how to
implement
a microformats2 parser, independent of any specific vocabularies.
Status
This is a
Living Specification
with several interoperable
implementations
. This specification is stable, subject to editorial changes only for improving clarity of existing meaning. While substantive changes are unexpected, it is a living specification subject to substantive change by issues and errata filed in response to implementation experience, requiring consensus among participating implementers (since 2015-01-21) as part of an explicit
change control
process. There are currently no draft or proposed new features in this specification, and if any were to be added, they would be explicitly labeled as such.
Note: This specification is only marked as a "Draft Specification" because of pending edits from
resolved issues before 2016-06-20
. Once those edits have been completed, the link to [[Category:Draft Specifications]] at the bottom of this document should be changed to [[Category:Specifications]].
Participate
Open Issues
Resolved issues before 2016-06-20
IRC
#microformats on Libera
Editor
Tantek Çelik
License
Per
CC0
, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of 2026-04-24, the editors have made this specification available under the
Open Web Foundation Agreement Version 1.0
Contents
algorithm
1.1
parse a document for microformats
1.2
parse an element for class microformats
1.3
parse an element for properties
1.3.1
parsing a p- property
1.3.2
parsing a u- property
1.3.3
parsing a dt- property
1.3.4
parsing an e- property
1.3.5
parsing for implied properties
1.4
parse a hyperlink element for rel microformats
1.4.1
rel parse examples
1.5
parse an img element for src and alt
what do the CSS selector expressions mean
note HTML parsing rules
note backward compatibility details
4.1
backward compatibility mappings
questions
implementations
test suite
change control
see also
algorithm
parse a document for microformats
To parse a document for microformats, follow the HTML parsing rules and do the following:
start with an empty JSON "items" array and "rels" & "rel-urls" hashes:
"items"
[],
"rels"
{},
"rel-urls"
{}
parse the root element for class microformats, adding to the JSON items array accordingly
parse all hyperlink (

) elements for rel microformats, adding to the JSON rels & rel-urls hashes accordingly
return the resulting JSON
Parsers may simultaneously parse the document for both class and rel microformats (e.g. in a single tree traversal).
parse an element for class microformats
To parse an element for class microformats:
parse element class for root class name(s) "h-*" and if none, backcompat root classes
if none found, parse child elements for microformats (depth first, doc order)
else if found, start parsing a new microformat
keep track of whether the root class name(s) was from backcompat
create a new { } structure with:
type: [array of unique microformat "h-*" type(s) on the element sorted alphabetically],
properties: { }
- to be filled in when that element itself is parsed for microformats properties
if the element has a non-empty
id
attribute:
id:
string value of element's id attribute
parse child elements (document order) by:
if parsing a backcompat root, parse child element class name(s) for backcompat properties
else parse a child element class for property class name(s) "p-*,u-*,dt-*,e-*"
if such class(es) are found, it is a property element
add properties found to current microformat's
properties: { }
structure
parse a child element for microformats (recurse)
if that child element itself has a microformat ("h-*" or backcompat roots) and is a property element, add it into the array of values for that property as a { } structure, add to that { } structure:
value
if it's a
p-*
property element, use the first p-name of the h-* child
else if it's an
e-*
property element, re-use its { } structure with existing
value:
inside.
else if it's a
u-*
property element and the h-* child has a u-url, use the first such u-url
else use the parsed property value per p-*,u-*,dt-* parsing respectively
else add found elements that are microformats to the "children" array
imply properties for the found microformat (see below)
The "*" for root (and property) class names consists of an optional vendor prefix (series of 1+ number or lowercase a-z characters i.e.
[0-9a-z]+
, followed by '-'), then one or more '-' separated lowercase a-z words.
parse an element for properties
parsing a
p-
property
To parse an element for a
p-x
property value (whether explicit
p-*
or backcompat equivalent):
Parse the element for the
value-class-pattern
. If a value is found, return it.
If
abbr.p-x[title]
or
link.p-x[title]
, return the
title
attribute.
else if
data.p-x[value]
or
input.p-x[value]
, then return the
value
attribute
else if
img.p-x[alt]
or
area.p-x[alt]
, then return the
alt
attribute
else return the
textContent
of the element after:
dropping any nested