Efficient XML Interchange (EXI) Format 1

Efficient XML Interchange (EXI) Format 1.0
Efficient XML Interchange (EXI) Format 1.0
W3C Candidate Recommendation 08 December 2009
This version:
Latest version:
Previous version:
Editors:
John Schneider, AgileDelta, Inc.
Takuki Kamiya, Fujitsu Laboratories of America, Inc.
This document is also available in these non-normative formats:
XML
W3C
MIT
ERCIM
Keio
), All Rights Reserved. W3C
liability
trademark
and
document use
rules apply.
Abstract
This document is the specification of the Efficient XML Interchange (EXI)
format. EXI is a very compact representation for the Extensible Markup
Language (XML) Information Set that is intended to simultaneously optimize
performance and the utilization of computational resources. The EXI
format uses a hybrid approach drawn from the information and formal language
theories, plus practical techniques verified by measurements,
for entropy encoding XML information. Using a relatively simple algorithm,
which is amenable to fast and compact implementation, and a small set of
datatype representations,
it reliably produces efficient encodings of XML event streams.
The grammar production system and format definition of EXI are presented.
Status of this Document
This section describes the status of this document at the time
of its publication. Other documents may supersede this document. A
list of current W3C publications and the latest revision of this
technical report can be found in the
W3C technical reports index
at
This is the Candidate Recommendation of the Efficient XML Interchange Format 1.0. It has been produced by the
EXI Working Group
, which part of the
Extensible Markup Language (XML) Activity
W3C publishes a Candidate Recommendation to indicate that the document is believed to be stable and to encourage implementation by the developer community. Publication as a Candidate Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The list of changes since the last publication is exhibited in the
Change Log
. A
diff-marked version
against the previous version of this document is also available.
The EXI Working Group plans to submit this specification for consideration as a W3C Proposed Recommendation when the following exit criteria have been met:
A test suite is available that tests each identified EXI format 1.0 feature, both required and optional.
At least two implementations, at least one of which is publicly available, have demonstrated interoperability of each feature. The working group will create an implementation report based on gathered evidence and make it available on its group web page.
The Working Group has responded formally to all issues raised against this document during the Candidate Recommendation review period.
There is no formal implementation report available at the present time.
The following feature is considered to be at risk:
Deriving Set of Characters from XML Schema Regular Expressions
The above feature may be removed in the subsequent revisions of this specification if it is found that implementations exhibit lack of interoperability in regards of the feature during the execution of the interoperability tests, and the problem cannot be recouped by simply clarifying the existing descriptions of the feature but would rather entail any semantic changes thereof.
Implementers are encouraged to provide feedback by 01 March 2010. Please comment by email to
public-exi-comments@w3.org
, a mailing list with a
public archive
. This mailing list is reserved for comments, it is inappropriate to send discussion email to this address. Discussion should take place on the
public-exi@w3.org
mailing list (
public archive
).
This document was produced by a group operating under the
5 February 2004 W3C Patent Policy
. W3C maintains a
public list of any patent disclosures
made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains
Essential Claim(s)
must disclose the information in accordance with
section 6 of the W3C Patent Policy
Table of Contents
1.
Introduction
1.1
History and Design
1.2
Notational Conventions and Terminology
2.
Design Principles
3.
Basic Concepts
4.
EXI Streams
5.
EXI Header
5.1
EXI Cookie
5.2
Distinguishing Bits
5.3
EXI Format Version
5.4
EXI Options
6.
Encoding EXI Streams
6.1
Determining Event Codes
6.2
Representing Event Codes
6.3
Fidelity Options
7.
Representing Event Content
7.1
Built-in EXI Datatype Representations
7.1.1
Binary
7.1.2
Boolean
7.1.3
Decimal
7.1.4
Float
7.1.5
Integer
7.1.6
Unsigned Integer
7.1.7
QName
7.1.8
Date-Time
7.1.9
n-bit Unsigned Integer
7.1.10
String
7.1.10.1
Restricted Character Sets
7.1.11
List
7.2
Enumerations
7.3
String Table
7.3.1
String Table Partitions
7.3.2
Partitions Optimized for Frequent use of Compact Identifiers
7.3.3
Partitions Optimized for Frequent use of String Literals
7.4
Datatype Representation Map
8.
EXI Grammars
8.1
Grammar Notation
8.1.1
Fixed Event Codes
8.1.2
Variable Event Codes
8.2
Grammar Event Codes
8.3
Pruning Unneeded Productions
8.4
Built-in XML Grammars
8.4.1
Built-in Document Grammar
8.4.2
Built-in Fragment Grammar
8.4.3
Built-in Element Grammar
8.5
Schema-informed Grammars
8.5.1
Schema-informed Document Grammar
8.5.2
Schema-informed Fragment Grammar
8.5.3
Schema-informed Element Fragment Grammar
8.5.4
Schema-informed Element and Type Grammars
8.5.4.1
EXI Proto-Grammars
8.5.4.1.1
Grammar Concatenation Operator
8.5.4.1.2
Element Grammars
8.5.4.1.3
Type Grammars
8.5.4.1.3.1
Simple Type Grammars
8.5.4.1.3.2
Complex Type Grammars
8.5.4.1.3.3
Complex Ur-Type Grammar
8.5.4.1.4
Attribute Uses
8.5.4.1.5
Particles
8.5.4.1.6
Element Terms
8.5.4.1.7
Wildcard Terms
8.5.4.1.8
Model Group Terms
8.5.4.1.8.1
Sequence Model Groups
8.5.4.1.8.2
Choice Model Groups
8.5.4.1.8.3
All Model Groups
8.5.4.2
EXI Normalized Grammars
8.5.4.2.1
Eliminating Productions with no Terminal Symbol
8.5.4.2.2
Eliminating Duplicate Terminal Symbols
8.5.4.3
Event Code Assignment
8.5.4.4
Undeclared Productions
8.5.4.4.1
Adding Productions when Strict is False
8.5.4.4.2
Adding Productions when Strict is True
9.
EXI Compression
9.1
Blocks
9.2
Channels
9.2.1
Structure Channel
9.2.2
Value Channels
9.3
Compressed Streams
10.
Conformance
10.1
EXI Stream Conformance
10.2
EXI Processor Conformance
Appendices
References
A.1
Normative References
A.2
Other References
Infoset Mapping
B.1
Document Information Item
B.2
Element Information Items
B.3
Attribute Information Item
B.4
Processing Instruction Information Item
B.5
Unexpanded Entity Reference Information item
B.6
Character Information item
B.7
Comment Information item
B.8
Document Type Declaration Information item
B.9
Unparsed Entity Information Item
B.10
Notation Information Item
B.11
Namespace Information Item
XML Schema for EXI Options Document
Initial Entries in String Table Partitions
D.1
Initial Entries in Uri Partition
D.2
Initial Entries in Prefix Partitions
D.3
Initial Entries in Local-Name Partitions
Deriving
Set of Characters
from XML Schema Regular Expressions
Content Coding and Internet Media Type
F.1
Content Coding
F.2
Internet Media Type
Example Encoding
(Non-Normative)
Schema-informed Grammar Examples
(Non-Normative)
H.1
Proto-Grammar Examples
H.2
Normalized Grammar Examples
H.3
Complete Grammar Examples
Recent Specification Changes
(Non-Normative)
I.1
Changes from Last Call Working Draft
I.2
Changes from Fourth Public Working Draft
I.3
Changes from Third Public Working Draft
I.4
Changes from Second Public Working Draft
I.5
Changes from First Public Working Draft
Acknowledgements
(Non-Normative)
1. Introduction
The Efficient XML Interchange (EXI) format is a very compact, high
performance XML representation that was designed to work well for a
broad range of applications. It simultaneously improves performance
and significantly reduces bandwidth requirements without compromising
efficient use of other resources such as battery life, code size,
processing power, and memory.
EXI uses a grammar-driven approach that achieves very efficient
encodings using a straightforward encoding algorithm and a small set
of
datatype representations.
Consequently,
EXI processors
are relatively simple and
can be implemented on devices with limited capacity.
EXI is schema
"informed", meaning that it can utilize available schema
information to improve compactness and performance, but does not
depend on accurate, complete or current schemas to work. It supports
arbitrary schema extensions and deviations and also works very
effectively with partial schemas or in the absence of any schema. The
format itself also does not depend on any particular schema language,
or format, for schema information.
[Definition:]
A program module
called an
EXI processor
, whether it is software or
hardware, is used by application programs to encode their structured data
into
EXI streams
and/or to decode
EXI streams
to make the structured
data accessible.
The former and latter aforementioned roles of EXI processors are called
[Definition:]
EXI stream encoder
and
[Definition:]
EXI stream decoder
respectfully.

This document not only specifies the
EXI format, but also defines errors that EXI processors are required to
detect and behave upon.
The primary goal of this document is to define the EXI format completely without leaving ambiguity so as to make it feasible for implementations to interoperate. As such, the document lends itself to describing the design and features of the format in a systematic manner, often declaratively with relatively few prosaic annotations and examples. Those readers who prefer a step-by-step introduction to the EXI format design and features are suggested to start with the non-normative
[EXI Primer]
1.1 History and Design
EXI is the result of extensive work carried out by the W3C's XML
Binary Characterization (XBC) and Efficient XML Interchange (EXI)
Working Groups. XBC was chartered to investigate the costs and
benefits of an alternative form of XML, and formulate a way to objectively
evaluate the potential of a substitute format for XML. Based on XBC's
recommendations, EXI was chartered, first to measure, evaluate, and
compare the performance of various XML technologies (using metrics
[XBC Measurement Methodologies]
), and then, if it appeared
suitable, to formulate a recommendation for a W3C format
specification. The measurements results and analyses, are presented
elsewhere
[EXI Measurements Note]
. The format described in this
document is the specification so recommended.
The functional requirements of the EXI format are those that were
prepared by the XBC WG in their analysis of the desirable properties
of a high performance representation for XML
[XBC Properties]
Those properties were derived from a very broad set of use cases also
identified by the XBC working group
[XBC Use Cases]
The design of the format presented here, is largely based on the
results of the measurements carried out by the group to evaluate the
performance characteristics (mainly of processing efficiency and
compactness) of various existing formats. The EXI format is based on
Efficient XML
[Efficient XML]
, including for example the basis heuristic grammar approach,
compression algorithm, and resulting entropy encoding.
EXI is compatible with XML at the XML Information Set
[XML Information Set]
level, rather than at the XML syntax level. This
permits it to encapsulate an efficient alternative syntax and grammar
for XML, while facilitating at least the potential for minimizing the
impact on XML application interoperability.
1.2 Notational Conventions and Terminology
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear
EMPHASIZED in this document, are to be interpreted as described in RFC
2119
[IETF RFC 2119]
. Other terminology used to describe the EXI
format is defined in the body of this specification.
The term
event
and
stream
is used throughout this document to denote
EXI event
and
EXI stream
respectively unless the words are qualified differently to mean otherwise.
This document specifies an abstract grammar for EXI. In grammar notation, all terminal
symbols are represented in plain text and all non-terminal symbols are
represented in
italics
. Grammar productions are
represented as follows:
LeftHandSide
Terminal
NonTerminal
A set of one or more grammar productions that share the same
left-hand side non-terminal symbol are often presented together annotated
with
event codes
that specify how events matching the terminal symbols of the associated productions are represented in the EXI stream as follows:
LeftHandSide
Terminal
NonTerminal
EventCode
Terminal
NonTerminal
EventCode
Terminal
NonTerminal
EventCode
...
Terminal
NonTerminal
EventCode
Section
8.1 Grammar Notation
introduces additional notations for describing productions and
event codes
in grammars. Those additional notations facilitate concise representation of the EXI grammar system.
[Definition:]
In this document, the term
qname
is used to denote a
QName
XS2
QName values are composed of a uri, a localname and an optional prefix. Two qnames are considered equal if they have the same uri and localname, regardless of their prefix values. In cases where prefixes are not relevant, such as in the grammar notation, they are not specified by this document.
Terminal symbols that are qualified with a qname permit the use of a wildcard symbol (*) in place of or as part of a qname. The forms of terminal symbols involving qname wildcards used in grammars and their definitions are described in the table below.
Wildcard
Definition
SE (*)
The terminal symbol that matches a start element (SE) event with any qname.
SE (
uri
: *)
The terminal symbol that matches a start element (SE) event with any local-name in namespace
uri
AT (*)
The terminal symbol that matches an attribute (AT) event with any qname.
AT (
uri
: *)
The terminal symbol that matches an attribute (AT) event with any local-name in namespace
uri
Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.
Prefix
Namespace Name
exi
xsd
xsi
In describing the layout of an EXI format construct, a pair of square brackets [ ] are used to surround the name of a field to denote that the occurrence of the field is optional in the structure of the part or component that contains the field.
In arithmetic expressions, the notation ⌈
⌉ where
represents a real number denotes the ceiling of
, that is, the smallest integer greater than or equal to
When it is stated that strings are sorted in lexicographical order,
it is done so character by character, and the order among characters are determined by comparing their Unicode codepoints.
2. Design Principles
The following design principles were used to guide the development of EXI and encourage consistent design decisions. They are listed here to provide insight into the EXI design rationale and to anchor discussions on desirable EXI traits.
General:
One of primary objectives of EXI is to maximize the number of systems, devices and applications that can communicate using XML data. Specialized approaches optimized for specific use cases should be avoided.
Minimal:
To reach the broadest set of small, mobile and embedded applications, simple, elegant approaches are preferred to large, analytical or complex ones.
Efficient:
EXI must be competitive with hand-optimized binary formats so it can be used by applications that require this level of efficiency.
Flexible:
EXI must deal flexibly and efficiently with documents that contain arbitrary schema extensions or deviate from their schema. Documents that contain schema deviations should not cause encoding to fail.
Interoperable:
EXI must integrate well with existing XML technologies, minimizing the changes required to those technologies. It must be compatible with the XML Information Set
[XML Information Set]
, without significant subsetting or supersetting, in order to maintain interoperability with existing and prospective XML specifications.
3. Basic Concepts
EXI achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur at any given point in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, EXI is optimized specifically for XML languages.
The built-in EXI grammars accept any XML document or fragment and may be augmented with productions derived from XML Schemas
[XML Schema Structures]
[XML Schema Datatypes]
, RELAX NG schemas
[ISO/IEC 19757-2:2003]
DTDs
[XML 1.0]
[XML 1.1]
or other sources of information about what is likely to occur in a set of XML documents. The
EXI stream encoder
uses the grammar to map a stream of XML information items onto a smaller, lower entropy, stream of
events
The
EXI stream encoder
then represents the stream of events using a set of simple variable length codes called
event codes
Event codes
are similar to Huffman codes
[Huffman Coding]
, but are much simpler to compute and maintain. They are encoded directly as a sequence of values, or if additional compression is desired, they are passed to the
EXI compression
algorithm, which replaces frequently occurring event patterns to further reduce size.
When schemas are used, EXI also supports a user-customizable set of Datatype Representations for efficiently representing typed values.
4. EXI Streams
[Definition:]
An
EXI stream
is an
EXI header
followed by an EXI body.
[Definition:]
The
EXI body
carries the content of the document, while the EXI header communicates the options used for encoding the EXI body.
Section
5. EXI Header
describes the
EXI header
[Definition:]
The building block of an EXI body is an
EXI event
An EXI body consists of a sequence of EXI events representing an
EXI document
or an
EXI fragment
The EXI events permitted at any given position in an EXI stream are determined by the EXI grammar.
As is the case with XML,
the events occur with nesting pairs of matching start element and end element events where any pair does not intersect with another except when it is fully contained in the other.

The EXI grammar incorporates knowledge of the XML grammar and may be augmented and refined using schema information and fidelity options. The EXI grammar is formally specified in section
8. EXI Grammars
An EXI body can represent an EXI document with a single root element or an EXI fragment with zero or more root elements.
[Definition:]
EXI documents
are EXI bodies with a single root element that conform to the Built-in Document Grammar (See
8.4.1 Built-in Document Grammar
) or Schema-informed Document Grammar (See
8.5.1 Schema-informed Document Grammar
).
[Definition:]
EXI fragments
are EXI bodies with zero or more root elements that conform to the Built-in Fragment Grammar (See
8.4.2 Built-in Fragment Grammar
) or Schema-informed Fragment Grammar (See
8.5.2 Schema-informed Fragment Grammar
).
[Definition:]
When schema information is available to describe the contents of an EXI body, such an EXI stream is a
schema-informed EXI stream
and the EXI body is interpreted according to the Schema-informed Document Grammar (See
8.5.1 Schema-informed Document Grammar
) or Schema-informed Fragment Grammar (See
8.5.2 Schema-informed Fragment Grammar
).
[Definition:]
Otherwise, an EXI stream is a
schema-less EXI stream
, and the EXI body is interpreted according to the Built-in Document Grammar (See
8.4.1 Built-in Document Grammar
) or Built-in Fragment Grammar (See
8.4.2 Built-in Fragment Grammar
).
The following table summarizes the EXI event types and associated event content that occur in an EXI stream.
[Definition:]
The content of an event consists of
content items
and the content items appear in an EXI stream in the order they are shown in the table
following their respective
event codes
that
each marks the start of an
event
In addition, the table includes the grammar notation used to represent each
event
in this specification. Each
event
in an EXI stream participates in a mapping system that relates
events
to XML Information Items so that an EXI document
or an EXI fragment
as a whole serves to represent an XML Information Set. The table shows XML Information Items relevant to each EXI event. Appendix
B Infoset Mapping
describes the mapping system in detail.
Table 4-1. EXI events
EXI Event Type
Event Content
(Content Items)
Grammar Notation
(Terminal Symbols)
Information Item
Start Document
SD
B.1 Document Information Item
End Document
ED
Start Element
qname
SE (
qname
B.2 Element Information Items
SE ( * )
SE (
uri :
* )
End Element
EE
Attribute
qname, value
AT (
qname
B.3 Attribute Information Item
AT ( * )
AT (
uri :
* )
Characters
value
CH
B.6 Character Information item
Namespace Declaration
uri
prefix
local-element-ns
NS
B.11 Namespace Information Item
Comment
text
CM
B.7 Comment Information item
Processing Instruction
name, text
PI
B.4 Processing Instruction Information Item
DOCTYPE
name, public, system, text
DT
B.8 Document Type Declaration Information item
Entity Reference
name
ER
B.5 Unexpanded Entity Reference Information item
Self Contained
SC
Section
6. Encoding EXI Streams
describes the algorithm used to encode
events
in the EXI stream.
As indicated in the table above, there are some event types that carry content with their
event
instances while other event types function as markers without content.
SE events may be followed by a series of NS events. Each NS event either associates a prefix with an URI, assigns a default namespace, or in the case of a namespace declaration with an empty URI, rescinds one of such associations in effect at the point of its occurrence. The effect of the association or disassociation caused by a NS event stays in effect until the corresponding EE event occurs.
Like XML, the namespace of a particular element may be specified by a namespace declaration
preceding
the element or a local namespace declaration following the element name. When the namespace is specified by a local namespace declaration, the
local-element-ns
flag of the associated NS event is set to true and the prefix of the element is set to the prefix of that NS event. When the namespace is specified by a previous namespace declaration, the
local-element-ns
flag of all local NS events is false and the prefix of the element is set according to the prefix component of the element
qname
. The series of NS events associated with a particular element may include at most one NS event with its
local-element-ns
flag
set to true. The
uri
of a NS event with its
local-element-ns
flag
set to true MUST match the
uri
of the associated SE event.
An SE event may be followed by a SC event, indicating the element is self-contained and can be read independently from the rest of the EXI body. Applications may use self-contained elements to index portions of the EXI body for random access.
The representation of
event codes
which identify the event type and start each event is described in
6.2 Representing Event Codes
Each item in the event content has a
datatype representation
associated with it as shown in the following table. The content of each
event
, if any, is encoded as a sequence of items each of which being encoded according to its
datatype representation
in order starting with the first item followed by subsequent items.
Table 4-2. Datatype representations of event content items
Content item
Used in
Datatype representation
name
PI, DT, ER
7.1.10 String
prefix
NS
7.1.10 String
local-element-ns
NS
7.1.2 Boolean
public
DT
7.1.10 String
qname
SE, AT
7.1.7 QName
system
DT
7.1.10 String
text
CM, PI
7.1.10 String
uri
NS
7.1.10 String
value
CH, AT
According to the schema datatype (see
7. Representing Event Content
) if any is in effect, otherwise
7.1.10 String
The datatype representation
used for each
value
content item depends on the schema
datatype
XS2
if any that is in effect for that
value
The String datatype representation (see
7.1.10 String
is used for
values
that do not have an associated schema datatype,
cannot be or are opted not to be represented by their associated datatype representations,
or occur in mixed content. Section
7. Representing Event Content
describes how each of the types listed above are encoded in an EXI stream.
Editorial note
The syntax and semantics of the NS event are designed to minimize the overhead required for representing namespace prefixes in EXI streams without introducing significant complexity. In the common case where each namespace is bound to a single prefix, this design reduces the overhead for representing all element and attribute namespace prefixes to zero bits.
5. EXI Header
Each
EXI stream
begins with an EXI header.
[Definition:]
The
EXI header
can identify EXI streams,

distinguish EXI
streams

from text XML documents,
identify the version of the EXI format being used, and specify the options used to process the body of the EXI stream.
The EXI header has the following structure:
[ EXI Cookie ]
Distinguishing Bits
Presence Bit
EXI Format
EXI Options
[Padding Bits]
for EXI Options
Version
The EXI Options field within an EXI header is optional. Its presence is indicated by
the value of the presence bit that follows
Distinguishing Bits
. The presence and absence is indicated by the value 1 and 0, respectively.
When the
compression
option is true, or the
alignment
option is
byte-alignment
or
pre-compression
padding bits of minumum length required to make the whole length of
the header byte-aligned are added at the end of the header.
On the other hand, there are no padding bits when the alignment in use is
bit-packed
The padding bits field
if it is present
can contain any values of bits as its contents.
The details of the
EXI Cookie
Distinguishing Bits
EXI Format Version
and
EXI Options
are described in the following sections.
5.1 EXI Cookie
[Definition:]
An
EXI header
MAY start with an
EXI Cookie
which is a four byte field that serves to indicate that the stream of which it is a part is an EXI stream.
The four byte field consists of four characters
" $ " , " E ", " X " and " I "

in that order, each represented as an ASCII octet, as follows.
This four byte sequence is particular to EXI and specific enough to distinguish EXI streams from a broad range of data types currently used on the Web. While the EXI cookie is optional, its use is RECOMMENDED in the EXI header when the EXI stream is exchanged in a context where a longer, more specific content-based datatype identifier is desired than that provided by the
Distinguishing Bits
, whose role is more narrowly focused on distinguishing EXI streams from XML documents.
5.2 Distinguishing Bits
[Definition:]
The second part in the EXI header is the
Distinguishing Bits
which is a two bit field of which the first bit contains the value 1 and the second bit contains the value 0, as follows.
Unlike the optional EXI cookie that MAY occur to precede this field, the presence of Distinguishing Bits is REQUIRED in the EXI header. It is used to distinguish EXI streams from text XML documents in the absence of an
EXI cookie

This two bit sequence is the minimum that suffices to distinguish EXI
streams from XML documents since it is the minimum length bit
pattern that cannot occur as the first two bits of a well-formed XML
document represented in any one of the conventional character
encodings, such as UTF-8, UTF-16, UCS-2, UCS-4, EBCDIC, ISO 8859,
Shift-JIS and EUC, according to
XML
[XML 1.0]
[XML 1.1]
Therefore, XML Processors that do not support EXI are expected to reject an EXI stream as early as they read
and process the first byte from the stream.
Systems that use EXI streams as well as XML documents can reliably look at
the Distinguishing Bits to determine whether to interpret a particular
stream as XML or EXI.
5.3 EXI Format Version
[Definition:]
The fourth part in the EXI header is the
EXI Format Version
, which identifies the version of the EXI format being used.
EXI format version numbers are integers. Each version of the EXI Format Specification specifies the corresponding EXI format version number to be used by conforming implementations. The EXI format version number that corresponds with this version of the EXI format specification is 1 (one).
The first bit of the version field indicates whether the version is a preview or final version of the EXI format.
A value of 0 indicates this is a final version and a value of 1 indicates this is a preview
version. Final versions correspond to final, approved versions of the EXI format specification.
An
EXI processor
that implements a final version of the EXI format specification is REQUIRED to process EXI streams that have a version field with its first bit set to 0 followed by a version number that corresponds to the version of the EXI specification the processor implements. The behavior of an EXI processor on an EXI stream with its first bit set to 0 followed by a version not corresponding to a version implemented by the processor is not constrained by this specification. For example, the EXI processor MAY reject such a stream outright or it MAY attempt to process the EXI body.

Preview versions of the EXI format are useful for
gaining implementation and deployment experience prior to finalizing a
particular version of the EXI format. While preview versions may match drafts of this specification, they are not governed by this specification and the behaviour of EXI processors encountering preview versions of the EXI format is implementation dependent. Implementers are free to coordinate to achieve interoperability between different preview versions of the EXI format.
Following the first bit of the version is a sequence of one or more
4-bit unsigned integers representing the version number. The version
number is determined by summing this sequence of 4-bit unsigned
values and adding 1 (one). The sequence is terminated by any 4-bit unsigned integer with
a value in the range 0-14. As such, the first 15 version numbers are
represented by 4 bits, the next 15 are represented by 8 bits, etc.
Given an EXI stream with its stream cursor positioned just past the first bit of the EXI format version field, the EXI format version number can be computed by going through the following steps with version number initially set to 1.
Read next 4 bits as an unsigned integer value.
Add the value that was just read to the version number.
If the value is 15, go to step 1, otherwise (i.e. the value being in the range of 0-14), use the current value of the version number as the EXI version number.
The following are example EXI format version numbers.
Example 5-1.
EXI Format Version Examples
EXI Format Version Field
Description
1 0000
Preview version 1
0 0000
Final version 1
0 1110
Final version 15
0 1111 0000
Final version 16
0 1111 0001
Final version 17
EXI processors
conforming with the final version of this
specification MUST use the 5-bit value 0 0000 as the version
number.
5.4 EXI Options
[Definition:]
The
fifth

part of the EXI
header is the
EXI Options
, which provides a way to specify the
options used to encode the body of the EXI stream
[Definition:]
The EXI Options are represented as an
EXI Options document
, which is an XML document encoded using the EXI format described in this specification.
This results in a very compact header
format that can be read and written with very little additional software.
The presence of EXI Options in its entirety is optional in EXI header,
and it is predicated on the value of the presence bit that follows the
Distinguishing Bits
When EXI Options are present in the header, an EXI Processor MUST observe the
specified options to process the EXI stream that follows. Otherwise,
an EXI Procesor may obtain the EXI options using another mechanism.
There are no fallback option values provided by this specification for use
in the absence of the whole EXI Options part.
EXI processors
MAY provide external means for applications or users to
specify EXI Options when the EXI Options document is absent.
Such
EXI processors
are typically used in controlled systems
where the knowledge about the effective EXI Options is shared prior to
the exchange of EXI
streams . The mechanisms to communicate out-of-bound
EXI Options and their representation are implementation dependent.
The following table describes the EXI options that may be specified in the
EXI Options document.
Table 5-1. EXI Options in Options Document
EXI Option
Description
Default Value
alignment
Alignment of
event codes
and
content items
bit-packed
compression
EXI compression is used to achieve better compactness
false
strict
Strict interpretation of schemas is used to achieve better compactness
false
fragment
Body is encoded as an
EXI fragment
instead of an
EXI document
false
preserve
Specifies whether comments, pis, etc. are preserved
all false
selfContained
Enables self-contained elements
false
schemaID
Identify the schema information, if any, used to encode the body
no default value
datatypeRepresentationMap
Specify alternate datatype representations for typed
values
in the
EXI body
no default value
blockSize
Specifies the block size used for EXI compression
1,000,000
valueMaxLength
Specifies the maximum string length of
value
content items to be considered for addition to the string table.
unbounded
valuePartitionCapacity
Specifies the total capacity of value partitions in a string table
unbounded
[user defined meta-data]
User defined meta-data may be added
no default value
Appendix
C XML Schema for EXI Options Document
provides an XML Schema
describing
the EXI Options document
This schema is designed to produce smaller headers
for option combinations used when compactness is critical.
The
EXI Options document
is
represented as an
EXI body
informed by the above mentioned schema using the default options
specified by the following XML document.
An EXI Options document consists only of an EXI body, and MUST NOT
start with an EXI header.
Header options used for encoding the
EXI Options document

Note that this specification does not require
EXI processors
to read and process the schema prescribed for EXI options document (
C XML Schema for EXI Options Document
), in order to process EXI options documents. EXI processors MUST use the schema-informed grammars that stem from the schema in processing EXI options documents, beyond which there is no requirement as to the use of the schema, and implementations are free to use any methods to retrieve the instructions that observe the grammars for processing EXI options documents. Section
8.5 Schema-informed Grammars
describes the system to derive schema-informed grammars from XML Schemas.
Below is a brief description of each EXI option.
[Definition:]
The
alignment option
is used to control the alignment of
event codes
and
content items
The value is one of
bit-packed
byte-alignment
or
pre-compression
, of which
bit-packed
is the default value assumed when the "alignment" element is absent in the
EXI Options document
When the value of
compression option
is set to true, alignment of the EXI Body is governed by the rules specified in
9. EXI Compression
instead of the alignment option value. The "alignment" element MUST NOT appear in an EXI options document when the "compression" element is present.
[Definition:]
The alignment option value
bit-packed
indicates that the the
event codes
and associated content are packed in bits without any padding in-between.
[Definition:]
The alignment option value
byte-alignment
indicates that the
event codes
and associated content are aligned on byte boundaries.
While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream.
[Definition:]
The alignment option value
pre-compression
indicates that all steps involved in compression (see section
9. EXI Compression
) are to be done with the exception of the final step of applying the DEFLATE algorithm.
The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression.
[Definition:]
The
compression option
is a Boolean used to increase compactness using additional computational resources.
The default value "false" is assumed when the "compression" element is absent in the
EXI Options document
When set to true, the
event codes
and associated content are compressed according to
9. EXI Compression
regardless of the
alignment
option value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the "alignment" element is present.
[Definition:]
The
strict option
is a Boolean used to increase compactness by using a strict interpretation of the schemas and omitting preservation of certain items, such as comments, processing instructions and namespace prefixes.
The default value "false" is assumed when the "strict" element is absent in the
EXI Options document
When set to true,
those productions that have NS, CM, PI, ER, and SC terminal symbols are omitted from the
EXI grammars, and schema-informed element and type grammars are restricted to only permit items declared in the schemas.
Consequently, when the strict option is set to true, xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes are only permitted to appear in a schema-informed element where they match specific schema declarations (i.e., wildcards or ur-types).

The "strict" element MUST NOT appear in an
EXI options document
when
one of "dtd", "prefixes", "comments", "pis" or "selfContained"
element is present in the same options document.
[Definition:]
The
fragment option
is a Boolean that indicates whether the
EXI body
is an
EXI document
or an
EXI fragment
When set to true, the
EXI body
is an
EXI fragment
. Otherwise, the
EXI body
is an
EXI document
. The default value "false" is assumed when the "fragment" element is absent in the
EXI Options document
[Definition:]
The
preserve option
is a set of Booleans that can be set independently to control whether
or how
certain information items are preserved in the EXI stream.
Section
6.3 Fidelity Options
describes the set of information items
affected
by the preserve option.
The elements "dtd", "prefixes", "comments" and "pis"
MUST NOT appear in an
EXI options document
when the "strict" element is present in the same options document.
The element "lexicalValues", on the other hand, is permitted to occur in the presence of "strict" element.
[Definition:]
The
selfContained option
is a Boolean used to enable the use of self-contained elements in the EXI stream.
Self-contained elements may be read independently from the rest of the EXI body, allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in an
EXI options document
when
one of "compression", "pre-compression" or "strict" elements are present
in the same options document. The default value "false" is assumed when the "selfContained" element is abscent from the
EXI Options document
[Definition:]
The
schemaID option
may be used to identify the schema information used
for processing
the EXI body.
When the
"schemaID" element in the
EXI options document
contains the xsi:nil attribute
with its value set to true,
no schema information
is used for processing
the EXI body.
When the value of the "schemaID" element is empty, no user defined schema information is used for processing the EXI body;
however, the built-in XML schema types are available for use in the EXI body.
When the schemaID option is absent (i.e., undefined), no statement is made about the schema information used to encode the EXI body and this information
MUST be communicated out of band.
This specification does not dictate the syntax or semantics of other values specified in this field. An example schemaID scheme is the use of URI that is apt for globally identifying schema resources on the Web.
The parties involved in the exchange are free to agree on the scheme of schemaID field that is appropriate for their use to uniquely identify the schema information.
[Definition:]
The
datatypeRepresentationMap option
specifies an alternate set of datatype representations for typed
values
in
the
EXI body
as described in
7.4 Datatype Representation Map
When there are no "datatypeRepresentationMap" elements in the
EXI Options document
, no Datatype Representation Map is used for processing the EXI body.
This option does not take effect when the value of the preserve.lexicalValues fidelity option is true (see
6.3 Fidelity Options
),
or when the
EXI stream
is a
schema-less EXI stream.
[Definition:]
The
blockSize option
specifies the block size used for EXI compression.
When the "blockSize" element is absent in the
EXI Options document
, the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory.
[Definition:]
The
valueMaxLength option
specifies the maximum length of
value
content items to be considered for addition to the string table.
The default value "unbounded" is assumed when the "valueMaxLength" element is absent in the
EXI Options document
[Definition:]
The
valuePartitionCapacity option
specifies the maximum number of
value
content items in the string table at any given time.
The default value "unbounded" is assumed when the "valuePartitionCapacity" element is absent in the
EXI Options document
Section
7.3.3 Partitions Optimized for Frequent use of String Literals
specifies the behavior of the string table when this capacity is reached.
[Definition:]
The
user defined meta-data
conveys auxillary information that applications may use to facilitate interpretation of the EXI stream.
The user defined meta-data MUST NOT be interpreted in a way that alters or extends the EXI data format defined in this specification.
User defined meta-data may be added to an EXI Options document just prior to the
alignment
option.
6. Encoding EXI Streams
The rules for encoding a series of
events
as an
EXI stream
are very
simple and are driven by a declarative set of grammars that describes
the structure of an
EXI stream
. Every
event
in the stream is
encoded using the same set of encoding rules, which are summarized as
follows:
Get the next event
data
to be encoded
If fidelity options (see
6.3 Fidelity Options
) indicate this event type is not processed,
go to step 1
Use the grammars to determine the
event code
of the
event
Encode the
event code
followed by the event content (see
Table 4-1
Evaluate the grammar production matched by the
event
Repeat until the End Document (ED) event is encoded
Self-contained (SC), namespace (NS) and attribute (AT) events associated with a given element occur directly after the start element (SE) event in the following order:
SC
NS
NS
...
NS
AT (xsi:type)
AT (xsi:nil)
AT
AT
...
AT
Namespace (NS) events occur in document order.
The xsi:type and xsi:nil attributes
occur before all other AT events.
When the grammar currently in effect for the element is either a
built-in element grammar
(see
8.4.3 Built-in Element Grammar
) or a
schema-informed element fragment grammar
(see
8.5.3 Schema-informed Element Fragment Grammar
), the remaining attribute (AT) events can occur in any order. Otherwise, when the grammar in effect is a
schema-informed element grammar
or a
schema-informed type grammar
(see
8.5.4 Schema-informed Element and Type Grammars
), the remaining attributes can occur in any order that is permitted by the grammar, though in practice they SHOULD occur in lexicographical order sorted first by
qname
local-name then by
qname
uri for achieving better compactness, where a
qname
is a
qname
of an attribute.
EXI uses the same simple procedure described above, to encode well-formed documents, document fragments, schema-valid information items, schema-invalid information items, information items partially described by schemas and information items with no schema at all. Only the grammars that describe these items differ. For example, an element with no schema information is encoded according to the XML grammar defined by the XML specification, while an element with schema information is encoded according to the more specific grammar defined by that schema.
[Definition:]
An
event code
is a sequence of 1 to 3 non-negative integers called parts used to identify each event in an EXI stream. The EXI grammars describe which events may occur at each point in an EXI stream and associate an even code with each one.
(See
8.2 Grammar Event Codes
for more description of event codes.)
Section
6.1 Determining Event Codes
describes in detail how the grammar is used to determine the event code of an
event
. Section
6.2 Representing Event Codes
describes in detail how event codes are represented as bits. Section
6.3 Fidelity Options
describes available fidelity options and how they
affect
the EXI stream. Section
7. Representing Event Content
describes how the typed event contents are represented as bits.
6.1 Determining Event Codes
The structure of an EXI stream is described by the EXI grammars, which are formally specified in section
8. EXI Grammars
. Each grammar defines which events are permitted to occur at any given point in the EXI stream and provides a pre-assigned
event code
for each one.
For example, the grammar productions below describe the events that can occur in a schema-informed EXI stream after the Start-Document (SD) event provided there are four global elements defined in the schema and assign an
event code
for each one.
See
8.5.1 Schema-informed Document Grammar
for the process used for generating the grammar productions below from the schema.
Example 6-1.
Example productions with event codes
Syntax
Event Code
DocContent
SE ("A")
DocEnd
SE ("B")
DocEnd
SE ("C")
DocEnd
SE ("D")
DocEnd
SE (*)
DocEnd
DT
DocContent
5.0
CM
DocContent
5.1.0
PI
DocContent
5.1.1
At the point in an EXI stream where the above grammar productions are in effect, the
event code
of Start Element "A" (i.e. SE("A")) is 0. The event code of a DOCTYPE (DT) event at this point in the stream is 5.0, and so on.
6.2 Representing Event Codes
Each
event code
is represented by a sequence of 1 to 3 parts that uniquely identify an event.
Event code
parts are encoded in order starting with the first part followed by subsequent parts.
The
-th part of an
event code
is encoded as an
-bit unsigned integer (
7.1.9 n-bit Unsigned Integer
), where
is ⌈ log
⌉ and
is the number of distinct values used as the
-th part of its own and all its sibling event codes in the current grammar.

Two
event codes
are siblings at the
-th part if and only if they share the same values in all preceding parts. All
event codes
are siblings at the first part.
If there is only one distinct value for a given part, the part is omitted (i.e., encoded in log
1 = 0 bits = 0 bytes).
For example, the eight
event codes
shown in the
DocContent
grammar above have values ranging from 0 to 5 for the first part. Six distinct values are needed to identify the first part of these
event codes
Therefore, the first part can be encoded as an
-bit unsigned integer where
= ⌈ log
6 ⌉ = 3. In the same fashion, the second and third part (if present) are represented as
-bit unsigned integers where
is ⌈ log
2 ⌉ = 1 and ⌈ log
2 ⌉ = 1 respectively.
When the value of the
compression option
is false and
bit-packed
alignment is used,
-bit unsigned integers are represented using
bits. The first table below illustrates how the
event codes
of each
event matched by the
DocContent
grammar above are represented in this case.
When the value of
compression option
is true, or either
byte-alignment
or
pre-compression
alignment option is used,
-bit unsigned integers are represented using the minimum number of bytes required to store
bits. The second table below illustrates how the
event codes
of each
event matched by the
DocContent
grammar above are represented in this case.
Example 6-2.
Example event code encoding
Table 6-1. Example event code encoding
when EXI compression is not in effect and
bit-packed
alignment option is used
Event
Part values
Event Code Encoding
# bits
SE ("A")
000
SE ("B")
001
SE ("C")
010
SE ("D")
011
SE (*)
100
DT
101  0
CM
101  1  0
PI
101  1  1
# distinct values (
# bits per part
⌈ log
Table 6-2. Example event code encoding
when EXI compression is in effect, or either
byte-alignment
or
pre-compression
alignment option is used
Event
Part values
Event Code Encoding
# bytes
SE ("A")
00000000
SE ("B")
00000001
SE ("C")
00000010
SE ("D")
00000011
SE (*)
00000100
DT
00000101  00000000
CM
00000101  00000001  00000000
PI
00000101  00000001  00000001
# distinct values (
# bytes per part
⌈ (log
) / 8 ⌉
6.3 Fidelity Options
Some XML applications do not require the entire XML feature set and would prefer to eliminate the overhead associated with unused features. For example, the SOAP 1.2 specification
[SOAP 1.2]
prohibits the use of XML
processing instructions.
In addition, there are many data-exchange use cases that do not require XML comments or DTDs.
The
preserve option
in EXI Options comprises a set of fidelity options, each of which independently controls the preservation (or preservation level)
of a certain type of information item.
Applications can use the
preserve option
to specify the set of fidelity options they require.
As specified in section
8.3 Pruning Unneeded Productions
, EXI processors MUST use these fidelity options to prune
productions that match the associated events from the grammars, improving compactness and processing efficiency.
The table below lists the fidelity options supported by this version of the EXI specification and describes the effect setting these options has on the
EXI stream
Table 6-3. Fidelity options
Fidelity option
Effect
Preserve.comments
CM events are preserved
Preserve.pis
PI events are preserved
Preserve.dtd
DOCTYPE and ER events are preserved
Preserve.prefixes
NS events and namespace prefixes are preserved
Preserve.lexicalValues
Lexical form of element and attribute values is preserved
in
value
content items
When
qualified names
NS
are used in the
value
of AT or CH events in an EXI Stream, the Preserve.prefixes fidelity option SHOULD be turned on to preserve the NS prefix declarations used by these values.
See section
4. EXI Streams
for the definition of EXI event types and their associated
content items
7. Representing Event Content
The
event code
of each event in an EXI body is represented as a sequence of
-bit unsigned integers (
7.1.9 n-bit Unsigned Integer
).
See section
6.2 Representing Event Codes
for the description of the event code representation.
The
content items
of an event are represented according to their datatype representations (see
Table 4-2
). In the absence of an associated datatype representation, attribute and character
values
are
represented as String (
7.1.10 String
).
[Definition:]
EXI defines a minimal set of datatype representations called
Built-in EXI datatype representations
that define how
content items
as well as the parts of an
event code
are represented in EXI streams.
When the
preserve.lexicalValues
option is false,
values
are represented using built-in EXI datatype representations
(see
7.1 Built-in EXI Datatype Representations
) or user-defined datatype representations
(see
7.4 Datatype Representation Map
) associated with the schema
datatypes
XS2

Otherwise,
values
are represented as Strings with restricted character sets (see
Table 7-2
below). The following table lists the
built-in EXI datatype representations, associated
EXI datatype identifiers
and the XML Schema
built-in datatypes
XS2
each
EXI datatype representation
is used to represent by default.
Table 7-1. Built-in EXI Datatypes
Built-in EXI Datatype Representation
EXI Datatype ID
XML Schema Datatypes
Binary
exi:base64Binary
base64Binary
exi:hexBinary
hexBinary
Boolean
exi:boolean
boolean
Date-Time
exi:dateTime
dateTime
exi:time
time
exi:date
date
exi:gYearMonth
gYearMonth
exi:gYear
gYear
exi:gMonthDay
gMonthDay
exi:gDay
gDay
exi:gMonth
gMonth
Decimal
exi:decimal
decimal
Float
exi:double
float
double
Integer
exi:integer
integer
String
exi:string
string
anySimpleType
anyURI
duration
QName
(except for values of xsi:type attribute),
Notation
All types derived by
union
n-bit Unsigned Integer
Used by
Integer
datatype representation for some bounded
integers
(see
7.1.5 Integer
Unsigned Integer
Used by
Integer
datatype representation for unsigned
integers
(see
7.1.5 Integer
List
All types derived by
list
, including
IDREFS
and
ENTITIES
QName
QName
but only for
the value of xsi:type attributes
By default, datatypes derived from the XML Schema datatypes above are also represented according to the associated
built-in EXI datatype representation
. When there are more than one XML Schema datatypes above from which a datatype is derived directly or indirectly, the closest ancestor is used to determine the
built-in EXI datatype representation
. For example, a value of XML Schema datatype xsd:int is represented according to the same
built-in EXI datatype representation
as a value of XML Schema datatype xsd:integer. Although xsd:int is derived indirectly from xsd:integer and also further from xsd:decimal, a value of xsd:int is processed as an instance of xsd:integer because xsd:integer is closer to xsd:int than xsd:decimal is in the datatype inheritance hierarchy.
Each EXI datatype identifier above is a
qname
that uniquely identifies one of the
built-in EXI datatype representations.
EXI datatype identifiers are used in
Datatype Representation Maps
to
change
the datatype representations used for specific schema
datatypes
XS2
and their sub-types.
Only
built-in EXI datatype representations
that have been assigned identifiers are usable in
Datatype Representation Maps
When the
preserve.lexicalValues
option is true, all
values
are represented as Strings.

The table below defines restricted character sets for several of the built-in EXI datatypes. Each
value
that would have otherwise been represented by one of these datatypes is instead represented as a String with the associated restricted character set,
regardless of the actual pattern facets, if any, specified in the definitions of the schema datatypes.
Table 7-2. Restricted Character Sets for Built-in EXI
Datatype Representations
EXI Datatype ID
Restricted Character Set
exi:base64Binary
{ #x9, #xA, #xD, #x20, +, /, [0-9], =, [A-Z], [a-z] }
exi:hexBinary
{ #x9, #xA, #xD, #x20, [0-9], [A-F], [a-f] }
exi:boolean
{ #x9, #xA, #xD, #x20, 0, 1, a, e, f, l, r, s, t, u }
exi:dateTime
{ #x9, #xA, #xD, #x20, +, -, ., [0-9], :, T, Z }
exi:time
exi:date
exi:gYearMonth
exi:gYear
exi:gMonthDay
exi:gDay
exi:gMonth
exi:decimal
{ #x9, #xA, #xD, #x20, +, -, ., [0-9] }
exi:double
{ #x9, #xA, #xD, #x20, +, -, ., [0-9], E, F, I, N, a, e }
exi:integer
{ #x9, #xA, #xD, #x20, +, -, [0-9] }
The restricted character set for the EXI List datatype representation is the restricted character set of the EXI datatype representation of the List item type.
The rules used to represent values of String depend on the
content items
to which the values belong. There are certain
content items
whose value representation involve the use of string tables while other
content items
are represented using the encoding rule described in
7.1.10 String
without involvement of string tables. The
content items
that use string tables and how each of such
content items
uses string tables to represent their values are described in
7.3 String Table
Schemas can provide one or more enumerated values for
datatypes.
When the
preserve.lexicalValues
option is false,
EXI exploits those pre-defined values when they are available to represent values of such
datatypes
in a more efficient manner than
would have done otherwise without using pre-defined values.
The encoding rule for representing
enumerated values
is described in
7.2 Enumerations
Datatypes
that are derived from
another
by union and their subtypes are always represented as String regardless of the availability of enumerated values. Representation of values of which the
datatype
is one of QName, Notation or a
datatype
derived therefrom by restriction are also not affected by enumerated values if any.
7.1 Built-in EXI Datatype Representations
The following sections describe the
built-in EXI datatype representations
used for representing
event codes
and
content items
in EXI streams. Unless otherwise stated, individual items in an EXI stream are packed into bytes most significant bit first.
7.1.1 Binary
The Binary datatype representation is a length-prefixed sequence of octets representing the binary content. The length is represented as an Unsigned Integer (see
7.1.6 Unsigned Integer
).
7.1.2 Boolean
In the absence of pattern facets in the schema datatype, the Boolean datatype representation is a
-bit unsigned integer (
7.1.9 n-bit Unsigned Integer
), where
is one (1). The value zero (0) represents false and the value one (1) represents true.
Otherwise, when pattern facets are available in the schema datatype, the Boolean datatype representation is a
-bit unsigned integer (
7.1.9 n-bit Unsigned Integer
), where
is two (2) and the values zero (0), one (1), two (2) and three (3) represent the values "false", "0", "true" and "1" respectively.
7.1.3 Decimal
The Decimal datatype representation is a Boolean sign (see
7.1.2 Boolean
) followed by two Unsigned Integers (see
7.1.6 Unsigned Integer
). A sign value of zero (0) is used to represent positive Decimal values and a sign value of one (1) is used to represent negative Decimal values. The first Unsigned Integer represents the integral portion of the Decimal value. The second Unsigned Integer represents the fractional portion of the Decimal value with the digits in reverse order to preserve leading zeros.
7.1.4 Float
The Float datatype representation is two consecutive Integers (see
7.1.5 Integer
). The first Integer represents the mantissa of the floating point number and the second Integer represents the base-10 exponent of the floating point number. The range of the mantissa is - (2
63
) to 2
63
-1 and the range of the exponent is - (2
14
-1) to 2
14
-1. Values typed as Float with a mantissa or exponent outside the accepted range are represented as untyped values.
The exponent value -(2
14
) is used to indicate one of the special values: infinity, negative infinity and not-a-number (NaN). An exponent value -(2
14
) with mantissa values 1 and -1 represents
positive infinity (INF) and negative infinity (-INF) respectively. An exponent value -(2
14
) with any other mantissa value represents NaN.
The Float datatype representation can be decoded by going through the following steps.
Retrieve the mantissa value using the procedure described in
7.1.5 Integer
Retrieve the exponent value using the procedure described in
7.1.5 Integer
If the exponent value is -(2
14
), the mantissa value 1 represents INF, the mantissa value -1 represents -INF and any other mantissa value represents NaN. If the exponent value is not -(2
14
), the float value is
× 10
where
is the mantissa and
is the exponent obtained in the preceding steps.
7.1.5 Integer
The Integer datatype representation supports signed integer numbers of arbitrary magnitude. The specific representation used depends on the
facet
XS2
values of the associated schema datatype as follows.
If the bounded range of the associated schema datatype is 4096 or smaller as determined by the values of
minInclusive
XS2
minExclusive
XS2
maxInclusive
XS2
and
maxExclusive
XS2
facets, the value is represented as an
n-bit Unsigned Integer
where
is ⌈ log
⌉ and
is the bounded range of the schema datatype.
Otherwise, if the
minInclusive
XS2
or
minExclusive
XS2
facets specify a lower bound greater than or equal to zero (0), the value is represented as an
Unsigned Integer
Otherwise, the value is represented as a Boolean sign (see
7.1.2 Boolean
) followed by an Unsigned Integer (see
7.1.6 Unsigned Integer
). A sign value of zero (0) is used to represent positive integers and a sign value of one (1) is used to represent negative integers. For non-negative values, the Unsigned Integer holds the magnitude of the value. For negative values, the Unsigned Integer holds the magnitude of the value minus 1.
7.1.6 Unsigned Integer
The Unsigned Integer datatype representation supports unsigned integer numbers of arbitrary magnitude. It is represented as a sequence of octets terminated by an octet with its most significant bit set to 0. The value of the unsigned integer is stored in the least significant 7 bits of the octets as a sequence of 7-bit bytes, with the least significant byte first.
EXI processors SHOULD support arbitrarily large Unsigned Integer values. EXI processors MUST support Unsigned Integer values less than 2147483648.
The Unsigned Integer datatype representation can be decoded by going through the following steps.
Example 7-1.
Example algorithm for decoding an Unsigned Integer
Start with the initial value set to 0 and the initial multiplier set to 1.
Read the next octet.
Multiply the value of the unsigned number represented by the 7 least significant bits of the octet by the current multiplier and add the result to the current value.
Multiply the multiplier by 128.
If the most significant bit of the octet was 1, go back to step 2.
7.1.7 QName
The QName datatype representation is a sequence of values representing the URI, local-name and prefix components of the QName in that order, where the prefix component is present only when the
preserve.prefixes
option is set to true.
When the QName value is specified by a schema-informed grammar using the SE (
qname
) or AT (
qname
) terminal symbols, URI and local-name are implicit and are omitted.
Similarly, when the URI of the QName value is derived from a schema-informed grammar using
SE (
uri
: *)
or AT (
uri
: *)
terminal symbols, URI is implicit thus omitted in the representation, and only the local-name component is encoded as a String (see
7.1.10 String
).
Otherwise, URI and local-name components are encoded as Strings.
If the QName is in no namespace, the URI is represented by a zero length String.
When present, prefixes are represented as
-bit unsigned integers (
7.1.9 n-bit Unsigned Integer
), where
is
⌈ log
) ⌉
and
is the number of
prefix
es in the prefix string table partition associated with the URI of the QName or one (1) if there are no prefixes in this partition.

If the given
prefix
exists in the associated prefix string table partition, it is represented using the compact identifier assiged by the partition. If the given
prefix
does not exist in the associated partition, the QName MUST be part of an SE event and the prefix MUST be resolved by one of the NS events immediately following the SE event (see resolution rules below). In this case, the unresolved prefix representation is not used and can be zero (0) or the compact identifer of any prefix in the associated partition.
Note:
When
is one, the prefix is represented using zero bits (i.e. omitted).
Given a
-bit unsigned integer
that represents either the prefix value or an unresolved prefix value, the effective prefix value is determined by following the rules described below in order. A QName is in error if its prefix cannot be resolved by the rules below.
If the prefix string table partition associated with the URI of the QName assigns the compact identifier
to a
prefix
value, select this
prefix
value as the candidate
prefix
value. Otherwise, there is no candidate
prefix
value.
If the QName value is part of an SE event followed by an associated NS event with
its
local-element-ns
flag value
set to true, the
prefix
value is the
prefix
of this NS event. Otherwise, the
prefix
value is the candidate value, if any, selected in step 1 above.
7.1.8 Date-Time
The Date-Time datatype representation is a sequence
of values representing the individual components of the Date-Time. The
following table specifies each of the possible date-time components
along with how they are encoded.
Table 7-3. Date-Time components
Component
Value
Type
Year
Offset from 2000
Integer (
7.1.5 Integer
MonthDay
Month * 32 + Day
9-bit Unsigned Integer (
7.1.9 n-bit Unsigned Integer
) where day is a value in the range 1-31 and month is a value in the range 1-12.
Time
((Hour * 64) + Minutes) * 64 + seconds
17-bit Unsigned Integer (
7.1.9 n-bit Unsigned Integer
FractionalSecs
Fractional seconds
Unsigned Integer (
7.1.6 Unsigned Integer
) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros
TimeZone
TZHours * 60 + TZMinutes
11-bit Unsigned Integer (
7.1.9 n-bit Unsigned Integer
) representing a signed integer offset by 840 ( = 14 * 60 )
presence
Boolean presence indicator
Boolean (
7.1.2 Boolean
The variety of components that constitute a value and their appearance order depend on the XML Schema type associated with the value. The following table shows which components are included in a value of each XML Schema type that is relevant to Date-Time datatype. Items listed in square brackets are included if and only if the value of its preceding presence indicator (specified above) is set to true.
Table 7-4. Assortment of Date-Time components
XML Schema Datatype
Included Components
gYear
XS2
Year, presence, [TimeZone]
gYearMonth
XS2
Year, MonthDay, presence, [TimeZone]
date
XS2
dateTime
XS2
Year, MonthDay, Time, presence, [FractionalSecs], presence, [TimeZone]
gMonth
XS2
MonthDay, presence, [TimeZone]
gMonthDay
XS2
gDay
XS2
time
XS2
Time, presence, [FractionalSecs], presence, [TimeZone]
7.1.9
-bit Unsigned Integer
When the value of the
compression option
is false and
the
bit-packed
alignment is used,
the
-bit Unsigned Integer datatype representation is an unsigned binary integer using
bits.
Otherwise, it is an unsigned integer using the minimum number of bytes required to store
bits. Bytes are ordered with the least significant byte first.
The
-bit unsigned integer is used for representing
event codes
, the prefix component of QNames (see
7.1.7 QName
) and certain
value
content items, as described in respective relevant parts of this document. As described in section
7.1.5 Integer
, integers with a bounded range size
equal to
4096
or smaller are represented as
-bit unsigned integers with
being ⌈ log
⌉, as an offset from the minimum value in the range.
7.1.10 String
The String datatype representation is a length prefixed sequence of
characters. The length indicates the number of characters in the
string and is represented as an Unsigned Integer (see
7.1.6 Unsigned Integer
). If a restricted character set is defined for the string (see
7.1.10.1 Restricted Character Sets
), each character is represented as an
-bit Unsigned Integer (see
7.1.9 n-bit Unsigned Integer
). Otherwise, each character is represented by its UCS
[ISO/IEC 10646]
code point encoded as an Unsigned Integer (see
7.1.6 Unsigned Integer
).
EXI uses a string table to represent certain
content items
more efficiently. Section
7.3 String Table
describes the string table and how it is applied to different content
items.
7.1.10.1 Restricted Character Sets
If a string value is associated with a schema
datatype
XS2
derived from xsd:string and one or more of the datatypes in its datatype hierarchy has one or more pattern facets, there may be a restricted character set defined for the string value. The following steps are used to determine the restricted character set, if any, defined for a given string value associated with such a schema datatype.
Given the schema datatype, let the target datatype definition be the definition of the most-derived datatype that has one or more pattern facets immediately specified in its definition in the schema among those in the datatype inheritance hierarchy that traces backwards toward
primitive datatypes
XS2
starting from the datatype.
If the target datatype definition is a definition for a
built-in datatype
XS2
, there is no restricted character set for the string value.Otherwise,
determine the set of characters for each immediate pattern facet of the target datatype definition according to section
E Deriving
Set of Characters
from XML Schema Regular Expressions
Then, compute the restricted set of characters for the string value as the union of all the sets of characters computed in the previous step. If the resulting set of characters contains less than
256
characters and contains only BMP characters, the string value has a restricted character set and each character is represented using an
-bit Unsigned Integer (see
7.1.9 n-bit Unsigned Integer
), where
is ⌈ log
+ 1) ⌉ and
is the number of characters in the restricted character set.
The characters in the restricted character set are sorted by UCS
[ISO/IEC 10646]
code point and represented by integer values in the range (0 ...
−1) according to their ordinal position in the set. Characters that are not in this set are represented by the
-bit Unsigned Integer
followed by the UCS code point of the character represented as an Unsigned Integer.
The figure below illustrates an overview of the process for determining and using restricted character sets described in this section.
Figure 7-1.
String Processing Model
7.1.11 List
Values of type List are encoded as a length
prefixed sequence of values. The length is encoded as an Unsigned Integer (see
7.1.6 Unsigned Integer
) and each value is encoded according
to its type (see
7. Representing Event Content
).
7.2 Enumerations
When the
preserve.lexicalValues
option is false,
enumerated values
are encoded as
-bit Unsigned Integers (
7.1.9 n-bit Unsigned Integer
) where
= ⌈ log
⌉ and
is the number of items
in the enumerated type. The value assigned to each item corresponds to
its ordinal position in the enumeration in schema-order starting with
position zero (0).
Exceptions are for schema types derived from others by union and their subtypes, QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.
7.3 String Table
EXI uses a string table to assign "compact identifiers" to some
string values. Occurrences of string values found in the string table
are represented using the associated compact identifier rather than
encoding the entire "string literal". The string table is initially pre-populated with
string values that are likely to occur in certain contexts and is
dynamically expanded to include additional string values encountered
in the document. The following
content items
are encoded using a
string table:
uris
prefixes
uri
and
local-name
in
qnames
values
When a string value is found in the string table, the value is encoded
using the compact identifier and no changes are made to the string table as a result.
When a string value is not found in the string table, its string literal is encoded
as a String without using a compact identifier, only after which
the string table is augmented by including the string value with an assigned
compact identifier
unless the string value represents a
value
content item
and fails to satisfy the criteria in effect by virtue of
valuePartitionCapacity
and
valueMaxLength
options

The string table is divided into partitions and each partition is
optimized for more frequent use of either compact identifiers or string literals
depending on the purpose of the partition. Section
7.3.1 String Table Partitions
describes how EXI string table is
partitioned. Section
7.3.2 Partitions Optimized for Frequent use of Compact Identifiers
describes how string values are encoded when the associated partition
is optimized for more frequent use of compact identifiers. Section
7.3.3 Partitions Optimized for Frequent use of String Literals
describes how string values are
encoded when the associated partition is optimized for more frequent use
of string literals.
The life cycle of a string table spans the processing of
a single EXI stream. String tables are not represented in an EXI stream or exchanged
between EXI processors. A string table cannot be reused across multiple EXI streams;
therefore, EXI processors MUST use a string table that is equivalent to
the one that would have been newly created and pre-populated with initial
values for processing each EXI stream.
7.3.1 String Table Partitions
The string table is organized into partitions
so that the indices assigned to compact identifiers can stay relatively small.
Smaller number of indices results in improved average compactness and the efficiency
of table operations. Each partition has a separate set of compact identifiers and
content items
are assigned to specific partitions as described below.
Uri
content items and the URI portion of
qname
content items are assigned to the uri
partition. The uri partition is optimized for frequent use of compact identifiers and is
pre-populated with initial entries as described in
D.1 Initial Entries in Uri Partition
When a schema is provided, the uri partition is also pre-populated with
the name of each
target
namespace URI declared in the schema,
plus some of the namespace URIs used in wildcard terms
and attribute wildcards
(see section
8.5.4.1.7 Wildcard Terms
and
8.5.4.1.3.2 Complex Type Grammars
, respectively
for the condition),
appended in lexicographical order.
Prefix
content items are assigned to partitions based
on their associated namespace URI. Partitions containing
prefix
content items are optimized for frequent use of compact identifiers and the
string table is pre-populated with entries as described in
D.2 Initial Entries in Prefix Partitions
The local-name portion of
qname
content items are assigned to partitions based
on the namespace URI of
the
qname
content item of which the local-name is a part. Partitions containing local-names are optimized for frequent use of string literals and the string table is pre-populated
with entries as described in
D.3 Initial Entries in Local-Name Partitions
When a schema is provided, the string table is also pre-populated with the
local name of each attribute, element and type declared in the
schema, partitioned by namespace URI and sorted lexicographically.
Each
value
content item is assigned to both the global value partition
and a "local" value partition based on the
qname
of the attribute or element in context at the time
the value is added to the value partitions.
Partitions containing
value
content items are optimized for frequent use of string literals and are initially empty.
[Definition:]
The variable
globalID
is a non-negative integer representing the compact identifier of the next item added to the global value partition.
Its value is initially set to 0 (zero) and changes while processing an EXI stream per the rule described in
7.3.3 Partitions Optimized for Frequent use of String Literals
7.3.2 Partitions Optimized for Frequent use of Compact Identifiers
String table partitions that are expected to contain a relatively
small number of entries used repeatedly throughout the document are
optimized for the frequent use of compact identifiers. This includes the
uri
partition and
all partitions containing
prefix
content items.
When a string value is found in a partition optimized for frequent use of compact identifiers,
the string value is represented as the value (
+1)
encoded as an
-bit Unsigned Integer (
7.1.9 n-bit Unsigned Integer
), where
is the value of the compact identifier,
is
⌈ log
+1) ⌉ and
is the number of
entries in the string table partition at the time of the operation.
When a string value is not found in a partition optimized for frequent use of compact identifiers,
the String value is represented as zero (0) encoded as an
-bit Unsigned Integer, followed by the string literal
encoded as a String (
7.1.10 String
). After
encoding the String value, it is added to the string table partition
and assigned the next available compact identifier
7.3.3 Partitions Optimized for Frequent use of String Literals
The remaining string table partitions are optimized for
the frequent use of string literals. This includes all string table partitions containing
local-names
and all string table partitions containing
value
content
items.
When a string value is found in the partitions containing
local-names, the
string value is represented as zero (0) encoded as an Unsigned Integer (see
7.1.6 Unsigned Integer
) followed by
the compact identifier of the string value. The compact identifier of the string
value is encoded as an
-bit unsigned integer (
7.1.9 n-bit Unsigned Integer
), where
is ⌈ log
⌉ and
is
the number of entries in the string table partition at the time of the operation.
When a string value is not found in the partitions containing
local-names, its
string literal is encoded as a String (see
7.1.10 String
) with the length of the string incremented
by one. After encoding the string value, it is added to the string
table partition and assigned the next available compact
identifier
As described above, each
value
content item is assigned
to two partitions, a "local" value partition and the global
value partition. When a string value is found in the "local" value partition,
the string value may be represented as zero (0) encoded as an Unsigned Integer (see
7.1.6 Unsigned Integer
) followed by the compact identifier
of the string value in the "local" value partition.
When a string value is found in the global value partition, the String value may be represented as one (1) encoded as an
Unsigned Integer (see
7.1.6 Unsigned Integer
) followed by the compact
identifier of the String value in the global value
partition. The compact identifier is encoded as an
-bit
unsigned integer (
7.1.9 n-bit Unsigned Integer
), where
is ⌈ log
⌉ and
is the number of entries in the
associated partition at the time of the operation.
When a string value
is not found in the global or "local"
value
partition, its string literal is encoded as a
String (see
7.1.10 String
) with the length
+ 2 (incremented by two) where
is the number of characters in the string value.
If
valuePartitionCapacity
is not zero, and
is greater than zero and no more than
valueMaxLength
, the string
is added to the associated "local" value partition using the next available compact identifier
and added to the global value partition using the compact identifier
globalID
. When
is added to the global value partition and there was already a string
in the global value partition associated with the compact identifier
globalID
, the string
replaces the string
in the global table, and the string
is removed from its associated local value partition by rendering its compact identifier permanently unassigned. When the string value is added to the global value partition, the value of
globalID
is incremented by one (1). If the resulting value of
globalID
is equal to
valuePartitionCapacity
, its value is reset to zero (0)
7.4 Datatype Representation Map
By default, each typed value in an EXI stream is represented using its
default built-in EXI datatype representation (see
Table 7-1
).
However,
[Definition:]
EXI processors
MAY provide the capability to specify alternate built-in EXI datatype representations or
user-defined datatype representations for specific schema
datatypes
XS2
This capability is called a
Datatype Representation Map
Note:
This feature is relevant only to simple types in the schema.
EXI does not provide a way for applications to infuse custom representations of structured data bound to complex types into the format.
EXI processors that support Datatype Representation Maps MAY provide implementation specific means to define and install user-defined datatype representations. EXI processors MAY also provide implementation specific means for applications or users to specify alternate built-in EXI datatype representations or user-defined datatype representations for representing specific schema datatypes. As with the default EXI datatype representations, alternate datatype representations are used for the associated XML Schema types specified in the Datatype Representation Map and XML Schema datatypes derived from those datatypes. When there are built-in or user-defined datatype representations associated with more than one XML Schema datatype in the type hierarchy of a particular datatype, the closest ancestor with an associated datatype representation is used to determine the EXI datatype representation.
When an EXI processor encodes an EXI stream using
a Datatype Representation Map and the
EXI Options
part of the header is present, the EXI options part MUST specify all alternate datatype representations used in the EXI stream.
An EXI processor that attempts to decode an
EXI stream that specifies a user-defined datatype representation in the EXI header that
it does not recognize MAY report a warning, but this is not an
error. However, when an EXI processor encounters a typed value that
was encoded by a user-defined datatype representation that it does not support, it MUST
report an error.
The EXI options header, when it appears in an EXI stream, MUST include a
"datatypeRepresentationMap" element for each
schema datatype
of which the descendant datatypes derived by restriction as well as itself are
not represented using the default
built-in EXI datatype representation.

The "datatypeRepresentationMap" element includes two child elements.
The
qname
of
the first child element identifies the schema datatype that is not
represented using the default
built-in EXI datatype representation

and the
qname
of the
second child element identifies the alternate
built-in EXI datatype representation or user-defined datatype representation
used to represent that type.
Built-in EXI datatype representations are identified by the type identifiers in
Table 7-1
For example, the following "datatypeRepresentationMap"element indicates all values of
type xsd:decimal
and the datatypes derived from it by restriction
in the EXI stream are represented using the built-in
String datatype representation, which has the type ID
exi:string
Example 7-2.
datatypeRepresentationMap indicating all Decimal values are represented using
built-in String datatype representation

It is the responsibility of an EXI processor to interface with a particular implementation of
built-in EXI datatype representations

or user-defined
datatype representations

properly. In the example above, an EXI processor may need to provide a string value of the data being processed that is typed as xsd:decimal in order to interface with
an implementation of built-in String datatype representation.

In such a case, some EXI processors may have started with a decimal value and such processors may well translate the value into a string before passing the data to
the implementation of built-in String datatype representation

while other EXI processors may already have a string value of the data so that it can pass the value directly to
the implementation of built-in String datatype representation

without any translation.
As another example, the following
"datatypeRepresentationMap" element indicates all
values of the used-defined
simple type
geo:geometricSurface
and the datatypes derived from it by restriction
are represented
using the user-defined datatype representation geo:geometricInterpolator:
Example 7-3.
datatypeRepresentationMap illustrating a user-defined type represented by a user-defined datatype representation

Note:
EXI only defines a way to indicate the use of user-defined datatype representations for representing values of specific datatypes.
Datatype representations are referred to by their respective
qnames
in "datatypeRepresentationMap" elements. A datatype representation is omnipresent only if its
qname
is one of those that represent built-in EXI datatype representations.
For datatype representations of other
qnames
, EXI does not provide nor suggest a method by which they are identified and shared between EXI Processors.
This suggests that the use of user-defined (i.e. custom) datatype representations
needs to be restrained by weighing alternatives and considering the consequences of each in pros and cons, in order to avoid unruly proliferation of documents that use such datatype representations.
Those applications that ever find Datatype Representation Map useful should make sure that they exchange such documents only among the parties that are pre-known or discovered to be able to process the user-defined datatype representations that are in use. Otherwise, if it is not for certain if a receiver undestands the particular user-defined datatype representations, the sender should never attempt to send documents that use user-defined datatype representations to that recipient.
8. EXI Grammars
EXI is a knowledge based encoding that uses a set of grammars to
determine which events are most likely to occur at any given point in
an EXI stream and encodes the most likely alternatives in fewer
bits. It does this by mapping the stream of events to a lower entropy
set of representative values and encoding those values using a set of
simple variable length codes or an EXI compression algorithm.
The result is a very simple, small algorithm that uniformly handles
schema-less encoding, schema-informed encoding, schema deviations,
and any combination thereof in EXI streams. These variations do
not require different algorithms or different parsers, they are simply
informed by different combinations of grammars.
The following sections describe the grammars used to inform the EXI encoding.
Note:
The grammar semantics in this specification are written for clarity and generality. They do not prescribe a particular implementation approach.
8.1 Grammar Notation
8.1.1 Fixed Event Codes
Each grammar production has an
event code
, which is represented by a sequence of one to three parts separated by periods ("."). Each part is an unsigned integer. The following are examples of grammar productions with
event codes
as they appear in this specification.
Example 8-1.
Example productions with fixed event codes
Productions
Event Codes
LeftHandSide
Terminal
NonTerminal
Terminal
NonTerminal
Terminal
NonTerminal
2.0
Terminal
NonTerminal
2.1
Terminal
NonTerminal
2.2.0
Terminal
NonTerminal
2.2.1
LeftHandSide
Terminal
NonTerminal
Terminal
NonTerminal
1.0
Terminal
NonTerminal
1.1
The number of parts in a given
event code
is called the event code's length. No two productions with the same non-terminal symbol on the left-hand side are permitted to have the same
event code
8.1.2 Variable Event Codes
Some non-terminal symbols are used on the right-hand side in a production without
a terminal symbol prefixed to them, but with a parenthesized event code affixed instead.
Such non-terminal symbols are macros and they are used to capture some recurring set of productions
as
symbols so that a symbol can be used in the grammar representation instead of including all the productions the macro represents in place every time it is used.
Example 8-2.
Example productions that use macro non-terminal symbols
ABigProduction
Terminal
NonTerminal
Terminal
NonTerminal
LEFTHANDSIDE
(2.0)
2.0
ABigProduction
Terminal
NonTerminal
LEFTHANDSIDE
(1.1)
1.1
Terminal
NonTerminal
1.2
Because non-terminal macros are injected into the right-hand side of more than one production,
the
event codes
of productions with these macro non-terminals on the left-hand side are not fixed, but will have different event code values depending on the context in which the macro non-terminal appears. This specification calls these variable event codes and uses variables in place of individual event code parts to indicate the event code parts are determined by the context. Below are some examples of variable event codes:
Example 8-3.
Example non-terminal macros and its productions with variable event codes
LEFTHANDSIDE
(n.m)
TERMINAL
NONTERMINAL
.0
TERMINAL
NONTERMINAL
.1
TERMINAL
NONTERMINAL
+2
TERMINAL
NONTERMINAL
+3
TERMINAL
NONTERMINAL
+4.0
TERMINAL
NONTERMINAL
+4.1
Unless otherwise specified, the variable
evaluates to the first part of the
event code
of the production in which the macro non-terminal
LEFTHANDSIDE
appears on the right-hand side. Similarly, the expression
represents the first two parts of the
event code
of the production in which the macro non-terminal
LEFTHANDSIDE
appears on the right-hand side.
Non-terminal macros are used in this specification for notational convenience only.
They are not non-terminals, even though they are used in place of non-terminals.
Productions that use non-terminal macros on the right-hand side need to be expanded by macro substitution before such productions are interpreted.
Therefore,
ABigProduction
and
ABigProduction
shown in the preceding example are equivalent to the following set of productions obtained by expanding the non-terminal macro symbol
LEFTHANDSIDE
and evaluating the variable event codes.
Example 8-4.
Expanded productions equivalent to the productions used above
ABigProduction
Terminal
NonTerminal
Terminal
NonTerminal
TERMINAL
NONTERMINAL
2.0
TERMINAL
NONTERMINAL
2.1
TERMINAL
NONTERMINAL
2.2
TERMINAL
NONTERMINAL
2.3
TERMINAL
NONTERMINAL
2.4.0
TERMINAL
NONTERMINAL
2.4.1
ABigProduction
Terminal
NonTerminal
TERMINAL
NONTERMINAL
1.0
TERMINAL
NONTERMINAL
1.1
Terminal
NonTerminal
1.2
TERMINAL
NONTERMINAL
1.3
TERMINAL
NONTERMINAL
1.4
TERMINAL
NONTERMINAL
1.5.0
TERMINAL
NONTERMINAL
1.5.1
8.2 Grammar Event Codes
Each production rule in the EXI grammar includes an
event code
value that approximates the likelihood the associated production rule will be matched over the other productions with the same left-hand-side non-terminal symbol. Ultimately, the
event codes
determine the value(s) by which each non-terminal symbol will be represented in the EXI stream.
To understand how a given
event code
approximates the likelihood a given production will match, it is useful to visualize the
event codes
for a set of production rules that have the same non-terminal symbol on the left-hand side as a tree. For example, the following set of productions:
Example 8-5.
Example productions with event codes
ElementContent
EE
SE (*)
ElementContent
1.0
CH
ElementContent
1.1
ER
ElementContent
1.2
CM
ElementContent
1.3.0
PI
ElementContent
1.3.1
represents a set of information items that might occur as element content after the start tag. Using the production
event codes
, we can visualize this set of productions as follows:
Figure 8-1.
Event code tree for ElementContent grammar
where the
terminal symbols
are represented by the leaf nodes of the tree, and the
event code
of each production rule
defines a path from the root of the tree to the node
that represents the terminal symbol that is on the right-hand side of the production.
We call this the event code tree for a given set of productions.
An event code tree is similar to a Huffman tree
[Huffman Coding]
in that shorter paths are generally used for symbols that are considered more likely. However, event code trees are far simpler and less costly to compute and maintain. Event code trees are shallow and contain at most three levels. In addition, the length of each
event code
in the event code tree is assigned statically without analyzing the data. This classification provides some of the benefits of a Huffman tree without the cost.
8.3 Pruning Unneeded Productions
As discussed in section
6.3 Fidelity Options
, applications MAY provide a set of fidelity options to specify the XML features they require. EXI processors MUST use these fidelity options to prune
the productions of which the terminal symbols represent the events that are not required from the grammars,
improving compactness and processing efficiency.
For example, the following set of productions represent the set of information items that might occur as element content after the start tag.
Example 8-6.
Example productions with full fidelity
ElementContent
EE
SE (*)
ElementContent
1.0
CH
ElementContent
1.1
ER
ElementContent
1.2
CM
ElementContent
1.3.0
PI
ElementContent
1.3.1
If an application sets the fidelity options preserve.comments, preserve.pis and preserve.dtd to false, the productions matching comment (CM), processing instruction (PI) and entity reference (ER) events are pruned from the grammar, producing the following set of productions:
Example 8-7.
Example productions after pruning
ElementContent
EE
SE (*)
ElementContent
1.0
CH
ElementContent
1.1
Removing these productions from the grammar tells EXI processors that comments and processing instructions will never occur in the EXI stream, which reduces the entropy of the stream allowing it to be encoded in fewer bits.
Each time a production is removed from a grammar, the
event codes
of the other productions with the same non-terminal symbol on the left-hand side MUST be adjusted to keep them contiguous if its removal has left the remaining productions with non-contiguous event codes.
8.4 Built-in XML Grammars
This section describes the built-in XML grammars used by EXI when no schema information is available or when available schema information describes only portions of the EXI stream.
The built-in XML grammars are dynamic and continuously evolve to reflect knowledge learned while processing an EXI stream. New
built-in element grammars
are created to describe the content of newly encountered elements and new grammar productions are added to refine existing built-in grammars. Newly learned grammars and productions are used to more efficiently represent subsequent events in the EXI stream. All newly created
built-in element grammars
are
global element grammars
[Definition:]
global element grammar
is a grammar describing the content of an element that has global scope (i.e. a global element).
At the onset of processing an EXI stream, the set of global element grammars is the set of all schema-informed element grammars derived from element declarations that have a {scope} property of
global
. Each
built-in element grammar
created while processing an EXI stream is added to the set of global element grammars. Each global element
grammar
has a unique
qname
8.4.1 Built-in Document Grammar
In the absence of schema information describing the content of the EXI stream, the following grammar describes the events that will occur in an
EXI document
Syntax
Event Code
Document
SD
DocContent
DocContent
SE (*)
DocEnd
DT
DocContent
1.0
CM
DocContent
1.1.0
PI
DocContent
1.1.1
DocEnd
ED
CM
DocEnd
1.0
PI
DocEnd
1.1
Semantics:
All productions in the built-in document grammars of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
8.4.2 Built-in Fragment Grammar
In the absence of schema information describing the contents of an EXI stream, the following grammar describes the events that may occur in an
EXI fragment
. The grammar below represents the initial set of productions in the built-in fragment grammar at the start of EXI stream processing. The associated semantics explain how the built-in fragment grammar evolves to more efficiently represent subsequent events in the EXI stream.
Syntax
Event Code
Fragment
SD
FragmentContent
FragmentContent
SE (*)
FragmentContent
ED
CM
FragmentContent
2.0
PI
FragmentContent
2.1
Semantics:
All productions in the built-in fragment grammars of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Create a production of the form
LeftHandSide
: SE (
qname
RightHandSide
with an
event code
Increment the first part of the
event code
of each production in the current grammar with the non-terminal
LeftHandSide
on the left-hand side
Add the production created in step
to the grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
All productions of the form
LeftHandSide
: SE (
qname
RightHandSide
that were previously added to the grammar upon the first occurrence of the element that has the
qname
qname
are evaluated as follows when they are matched:
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
8.4.3 Built-in Element Grammar
[Definition:]
When no grammar exists for an element occuring in an EXI stream, a
built-in element grammar
is created for that element.
Built-in element grammars are initially generic and are progressively refined as the specific content for the associated element is learned. All built-in element grammars are
global element grammar
s and can be uniquely identified by the qname of the global element they describe. At the outset of processing an EXI stream, the set of built-in element grammars is empty.
Below is the initial set of productions used for all newly created
built-in element grammars
. The semantics describe how productions are added to each
built-in element grammar
as the content of the associated element is learned.
Syntax
Event Code
StartTagContent
EE
0.0
AT (*)
StartTagContent
0.1
NS
StartTagContent
0.2
SC
Fragment
0.3
ChildContentItems
(0.4)
ElementContent
EE
ChildContentItems
(1.0)
ChildContentItems (n.m)
SE (*)
ElementContent
CH
ElementContent
.(
+1)
ER
ElementContent
.(
+2)
CM
ElementContent
.(
+3).0
PI
ElementContent
.(
+3).1
Note:
When the value of the
selfContained option
is false,
the production with the terminal symbol SC on the right-hand side is absent from the above grammar, and
the use of the non-terminal macro
ChildContentItems
that has
StartTagContent
non-terminal on the left-hand side (shown in bold above) gets expanded with variable values (0.3) instead of (0.4) used above.
When the xsi:type attribute appears in an element where the
built-in element grammar
is in effect, it MUST occur before any other AT events of the same element.
Semantics:
All productions in the
built-in element grammar
of the form
LeftHandSide
: AT (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the attribute matched by AT (*)
Create a production of the form
LeftHandSide
: AT (
qname
RightHandSide
with an
event code
0 and increment the first part of the event code of each production in the current grammar with the non-terminal
LeftHandSide
on the left-hand side. Add this production to the grammar.
If
qname
is xsi:type, let
target-type
be the value of the xsi:type attribute and assign it the QName datatype representation (see
7.1.7 QName
). If there is no namespace in scope for the specified qname prefix, set the
uri
of
target-type
to empty ("") and the
localName
to the full lexical value of the QName, including the prefix. Encode
target-type
according to section
7. Representing Event Content
. If a grammar can be found for the
target-type
type using the encoded
target-type
representation, evaluate the element contents using the grammar for
target-type
type instead of
RightHandSide
All productions of the form
LeftHandSide
: SC
Fragment
are evaluated as follows:
Save the string table, grammars and any implementation-specific state learned while processing this EXI Body.
Initialize the string table, grammars and any implementation-specific state learned while processing this EXI Body to the state they held just prior to processing this EXI Body.
Skip to the next byte-aligned boundary in the stream
if it is not already at such a boundary.
Let
qname
be the
qname
of the SE event immediately preceding this SC event.
Let
content
be the sequence of events following this SC event that match the grammar for element
qname
, up to and including the terminating EE event.
Evaluate the sequence of events (SD, SE(
qname
),
content
, ED) according to the
Fragment
grammar (see
8.4.2 Built-in Fragment Grammar
).
Skip to the next byte-aligned boundary in the stream
if it is not already at such a boundary.
Restore the string table, grammars and implementation-specific state learned while processing this EXI Body to that saved in step 1 above.
All productions in the
built-in element grammar
of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Create a production of the form
LeftHandSide
: SE (
qname
RightHandSide
with an
event code
Increment the first part of the
event code
of each production in the current grammar with the non-terminal
LeftHandSide
on the left-hand side
Add the production created in step
to the grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
All productions of the form
LeftHandSide
: SE (
qname
RightHandSide
that were previously added to the grammar upon the first occurrence of the element that has the
qname
qname
are evaluated as follows when they are matched:
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
All productions in the
built-in element grammar
of the form
LeftHandSide
: CH
RightHandSide
are evaluated as follows:
If a production of the form,
LeftHandSide
: CH
RightHandSide
with an
event code
of length 1 does not exist in the current element grammar, create one with event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal
LeftHandSide
on the left-hand side.
Add the production created in step 1 to the grammar
Evaluate the remainder of event sequence using
RightHandSide
All productions in the
built-in element grammar
of the form
LeftHandSide
: EE
are evaluated as follows:
If a production of the form,
LeftHandSide
: EE
with an
event code
of length 1 does not exist in the current element grammar, create one with event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal
LeftHandSide
on the left-hand side.
Add the production created in step 1 to the grammar
8.5 Schema-informed Grammars
This section describes the schema-informed grammars used by EXI when schema information is available to describe the contents of the
EXI stream
Schema information used for processing an EXI stream is either indicated by the header option
schemaID
, or communicated out-of-band in the absence of
schemaID
Schema-informed grammars are independent of any particular schema language and can be derived from XML Schemas
[XML Schema Structures]
[XML Schema Datatypes]
, RELAX NG schemas
[ISO/IEC 19757-2:2003]
, DTDs
[XML 1.0]
[XML 1.1]
or other
schema languages

for describing what is likely to occur in an EXI stream.
Schema-informed grammars accept all XML documents and fragments regardless of whether and how closely they match the schema. The
EXI stream encoder
encodes individual events using schema-informed grammars where they are available and falls back to the built-in XML grammars where they are not. In general, events for which a schema-informed grammar exists will be encoded more efficiently.
Unlike built-in XML grammars, schema-informed grammars are static and do not evolve, which permits the reuse of schema-informed grammars across the processing of multiple EXI streams. This is a single outstanding difference between the two grammar systems.
It is important to note that schema-informed and built-in grammars are often used together within the context of a single
EXI stream
. While processing a schema-informed grammar, built-in grammars may be created to represent schema deviations or elements that match wildcards declared in the schema. Even though these built-in grammars occur in the context of a schema-informed stream, they are still dynamic and evolve to represent content learned while processing the EXI stream as is described in
8.4 Built-in XML Grammars
8.5.1 Schema-informed Document Grammar
When schema information is available to describe the contents of an
EXI stream
, the following grammar describes the events that will occur in an
EXI document
Syntax
Event Code
Document
SD
DocContent
DocContent
SE (G
DocEnd
SE (G
DocEnd
SE (G
−1
DocEnd
−1
SE (*)
DocEnd
DT
DocContent
+1).0
CM
DocContent
+1).1.0
PI
DocContent
+1).1.1
Note:
The variable
in the grammar above is the number of global elements declared in the schema.
, G
, ... G
−1
represent all the
qnames
of global elements sorted lexicographically, first by local-name, then by uri.
DocEnd
ED
CM
DocEnd
1.0
PI
DocEnd
1.1
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
8.5.2 Schema-informed Fragment Grammar
When schema information is available to describe the contents of an
EXI stream
, the following grammar describes the events that will occur in an
EXI fragment
Syntax
Event Code
Fragment
SD
FragmentContent
FragmentContent
SE (F
FragmentContent
SE (F
FragmentContent
SE (F
−1
FragmentContent
−1
SE (*)
FragmentContent
ED
+1
CM
FragmentContent
+2).0
PI
FragmentContent
+2).1
Note:
The variable
in the grammar above represents the number of unique element
qnames
declared in the schema. The variables F
, F
, ... F
n−1
represent these
qnames
sorted lexicographically, first by local-name, then by uri. If there is more than one element declared with the same
qname
, the
qname
is included only once.
If all such elements have the same schema type name and {nillable} property value, their content is evaluated according to the specific grammar for that element declaration.
Otherwise, their content is evaluated according to the relaxed Element Fragment grammar described in
8.5.3 Schema-informed Element Fragment Grammar
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
8.5.3 Schema-informed Element Fragment Grammar
[Definition:]
When schema information is available to describe the contents of an
EXI stream
and more than one element is declared with the same
qname
but not all such elements have the same type name and {nillable} property value,
the
Schema-informed Element Fragment Grammar
are used for processing the events that may occur in such elements when they occur inside an EXI fragment or EXI Element Fragment.
The schema-informed element fragment grammar consists of
ElementFragment
and
ElementFragmentTypeEmpty
which are defined below.
ElementFragment
is a grammar that accounts both element declarations and attribute declarations in the schemas, whereas
ElementFragmentTypeEmpty
is a grammar that regards only attribute declarations.
Syntax
Event Code
ElementFragment
AT (A
) [schema-type value]
ElementFragment
AT (A
) [schema-typed value]
ElementFragment
AT (A
−1
) [schema-typed value]
ElementFragment
−1
AT (*)
ElementFragment
SE (F
ElementFragment
+1
SE (F
ElementFragment
+2
SE (F
-1
ElementFragment
SE (*)
ElementFragment
+1
EE
+2
CH [untyped value]
ElementFragment
+3
ElementFragment
SE (F
ElementFragment
SE (F
ElementFragment
SE (F
-1
ElementFragment
-1
SE (*)
ElementFragment
EE
+1
CH [untyped value]
ElementFragment
+2
ElementFragmentTypeEmpty
AT (A
) [schema-valid value]
ElementFragmentTypeEmpty
AT (A
) [schema-valid value]
ElementFragmentTypeEmpty
AT (A
−1
) [schema-valid value]
ElementFragmentTypeEmpty
−1
AT (*)
ElementFragmentTypeEmpty
EE
+1
ElementFragmentTypeEmpty
EE
Note:
The variable
in the grammar above represents the number of unique attribute
qnames
declared in the schema. The variables A
, A
, ... A
−1
represent these qnames sorted lexicographically, first by local-name, then by uri. If there is more than one attribute declared with the same
qname
, the qname is included only once.
If all such attributes have the same schema type name, their
value
is represented using that type.
Otherwise, their
value
is represented as a String.
The variable
in the grammar above represents the number of unique element
qnames
declared in the schema. The variables F
, F
, ... F
-1
represent these qnames sorted lexicographically, first by local-name, then by uri. If there is more than one element declared with the same
qname
, the qname is included only once.
If all such elements have the same type name and {nillable} property value, their content is evaluated according to specific grammar for that element declaration.
Otherwise, their content is evaluated according to the relaxed Element Fragment grammar described above.
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
All productions in the schema-informed element fragment grammar of the form
LeftHandSide
: AT (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the attribute matched by AT (*)
If
qname
is xsi:type, let
target-type
be the value of the xsi:type attribute and assign it the QName datatype representation (see
7.1.7 QName
). If there is no namespace in scope for the specified qname prefix, set the
uri
of
target-type
to empty ("") and the
localName
to the full lexical value of the QName, including the prefix. Encode
target-type
according to section
7. Representing Event Content
. If a grammar can be found for the
target-type
type using the encoded
target-type
representation, evaluate the element contents using the grammar for
target-type
type instead of
RightHandSide
If
qname
is xsi:nil, let
nil
be the value of the xsi:nil attribute. If
nil
is a valid Boolean value, assign it the Boolean datatype representation (see
7.1.2 Boolean
) and encode it according to section
7. Representing Event Content
. If
nil
is not a valid Boolean value, represent the
xsi:nil attribute event using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
). If the value of
nil
is true, evaluate the element contents using the grammar
ElementFragmentTypeEmpty
defined above rather than
RightHandSide
If a global attribute definition exists for
qname
, let
global-type
be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated with
global-type
, it SHOULD be represented using the datatype representation associated with
global-type
(see
7. Representing Event Content
). If the attribute value is not represented using the datatype representation associated with
global-type
, represent the
attribute event
using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
).
As with all schema informed element grammars, the schema-informed element fragment grammar is augmented with additional productions that describe events that may occur in an EXI stream, but are not explicity declared in the schema. The process for augmenting the grammar is described in
8.5.4.4 Undeclared Productions
For the purposes of this process, the schema-informed element fragment grammar is treated as though it is created from an element declaration with a {nillable} property value of true and a type declaration that has named sub-types, and
ElementFragmentTypeEmpty
is used to serve as the
TypeEmpty
of the type in the process.
The
content
index of grammars
ElementFragment
and
ElementFragmentTypeEmpty
are both 1 (one).
8.5.4 Schema-informed Element and Type Grammars
[Definition:]
When one or more XML Schema is available to describe the contents of an EXI stream, a
schema-informed element grammar
Element
is derived for each element declaration
described by the schemas, where 0 ≤
and
is the number of element declarations in the schema.
[Definition:]
When one or more XML Schema is available to describe the contents of an EXI stream, a
schema-informed type grammar
Type
is derived
for each named type declaration
described by the schemas as well as for each of the
built-in primitive types
XS2
and
built-in derived types
XS2
, the
complex ur-type
XS1
and the
simple ur-type
XS2
defined by XML Schema specification
[XML Schema Structures]
[XML Schema Datatypes]
, where 0 ≤
and
is the total number of such available types.
Each schema-informed element grammar and type grammar is constructed according to the following four steps:
Create a proto-grammar that describes the content model according to available schema information (see section
8.5.4.1 EXI Proto-Grammars
).
Normalize the proto-grammar into an EXI grammar (see section
8.5.4.2 EXI Normalized Grammars
).
Assign
event codes
to each production in the normalized EXI grammar (see section
8.5.4.3 Event Code Assignment
).
Add additional productions to the normalized EXI grammar to represent events that may occur in the EXI stream, but are not described by the schema, such as comments, processing-instructions, schema-deviations, etc. (see section
8.5.4.4 Undeclared Productions
).
Each element grammar
Element
includes a sequence of
non-terminals
Element
i, j
, where 0 ≤
. The content of the entire element is described by the first non-terminal
Element
i, 0
. The remaining non-terminals describe portions of the element content. Likewise, each type grammar
Type
includes a sequence of
non-terminals
Type
i, j
and the content of the entire type is described by the first non-terminal
Type
i, 0
The algorithms expressed in this section provide a concise and formal description of the EXI grammars for a given set of XML Schema definitions. More efficient algorithms likely exist for generating these EXI grammars and EXI implementations are free to use any algorithm that produces grammars and
event codes
that generate EXI encodings that match those produced by the grammars described here.
An example is provided in the appendix (see
H Schema-informed Grammar Examples
) that demonstrates the process described in this section to generate a complete schema-informed element grammar from an element declaration
in a schema.
8.5.4.1 EXI Proto-Grammars
This section describes the process for creating the EXI proto-grammars from XML Schema declarations and definitions. EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:
LeftHandSide
RightHandSide
where
LeftHandSide
and
RightHandSide
are both non-terminals. Whereas, all productions in a normalized EXI grammar contain exactly one terminal symbol and at most one non-terminal symbol on the right-hand side. This is a restricted form of Greibach normal form
[Greibach Normal Form]
EXI proto-grammars are derived from XML Schema in a straight-forward manner and can easily be normalized with simple algorithm (see
8.5.4.2 EXI Normalized Grammars
).
8.5.4.1.1 Grammar Concatenation Operator
Proto-grammars are specified in a modular, constructive fashion. XML Schema components such as terms, particles, attribute uses are transformed each into a distinct proto-grammar, leveraging proto-grammars of their sub-components. At various stages of proto-grammar construction, two or more of proto-grammars are concatenated one after another to form more composite grammars.
The grammar concatenation operator ⊕ is a binary, associative operator that creates a new grammar from its left and right grammar operands. The new grammar accepts any set of symbols accepted by its left operand followed by any set of symbols accepted by its right operand.
Given a left operand
Grammar
and a right operand
Grammar
, the following operation
Grammar
Grammar
creates a combined grammar by replacing each production of the form
Grammar
EE
where 0 ≤
and
is the number of non-terminals that occur on the left-hand side of productions in
Grammar
, with a production of the form
Grammar
Grammar
connecting each accept state of
Grammar
with the start state of
Grammar
8.5.4.1.2 Element Grammars
This section describes the process for creating an EXI element grammar from an XML Schema
element declaration
XS1
Given an element declaration
, with properties {name}, {target namespace}, {type definition}, {scope} and {nillable}, create a corresponding EXI grammar
Element
for evaluating the contents of elements in the specified {scope} with
qname
local-name = {name} and
qname
uri = {target namespace}
where
qname
is the
qname
of the elements.
Let
be the {type definition} of
and
Type
be the type grammar created from
. The grammar
Element
describing the content model of
is created as follows.
Syntax:
Element
i , 0
Type
j , 0
8.5.4.1.3 Type Grammars
Given an XML Schema type definition
with properties {name} and {target namespace},
two type grammars are created, which are denoted by
Type
and
TypeEmpty
[Definition:]
Type
is a grammar that fully reflects the type definition of
, whereas
[Definition:]
TypeEmpty
is a grammar that
regards
only the attribute uses and attribute wildcards of
, if any
The grammar
Type
is used for evaluating the content of elements that are defined to be of type
in the schema.
[Definition:]
Type
is a
global type grammar
when
is a named type.
Type
, when it is a global type grammar, can additionally be used as the effective grammar designated by a xsi:type attribute with the attribute value that is a
qname
with local-name = {name} and uri = {target namespace}.
TypeEmpty
is used in place of
Type
when the element instance that is being evaluated has a xsi:nil attribute with the value
true
[Definition:]
For each type grammar
Type
, an unique index number
content
is determined such that all non-terminal symbols of indices smaller than
content
have at least one
AT terminal symbol
and the rest of the non-terminal symbols in
Type
do not have
AT terminal symbols
on their right-hand side, where indices are assigned to non-terminal symbols in ascending order with the entry non-terminal symbol of
Type
being assined index 0 (zero).
There is also a
content
index associated with each
TypeEmpty
where its value is determined in the same manner as for
Type
Sections
8.5.4.1.3.1 Simple Type Grammars
and
8.5.4.1.3.2 Complex Type Grammars
describe the processes for creating
Type
and
TypeEmpty
from XML Schema
simple type definitions
XS1
and
complex type definitions
XS1
defined in schemas as well as
built-in primitive types
XS2
built-in derived types
XS2
and
simple ur-type
XS2
defined by XML Schema specification
[XML Schema Datatypes]

Section
8.5.4.1.3.3 Complex Ur-Type Grammar
defines the grammar used for processing instances of element contents of type
xsd:anyType
XS1
8.5.4.1.3.1 Simple Type Grammars
This section describes the process for creating an EXI type grammar from an XML Schema
simple type definition
XS1
Given a simple type definition
create two new EXI grammars
Type
and
TypeEmpty
following the procedure described below.
Add the following grammar productions to
Type
and
TypeEmpty
Syntax:
Type
i, 0
CH [schema-typed value]
Type
i, 1
Type
i, 1
EE
TypeEmpty
i, 0
EE
Note:
Productions of the form
LeftHandSide
: CH [schema-typed value]
RightHandSide
represent
typed character data that can be represented using the EXI datatype representation associated with the simple type definition (see
7. Representing Event Content
).

Character data that can be represented using the EXI datatype representation associated with the simple type definition SHOULD be represented this way. Character data that is not represented using the EXI datatype representation associated with the simple type definition is represented by productions of the form
LeftHandSide
: CH [untyped value]
RightHandSide
described in section
8.5.4.4 Undeclared Productions
The
content
index of grammar
Type
and
TypeEmpty
created from an XML Schema simple type definition is always 0 (zero).
8.5.4.1.3.2 Complex Type Grammars
This section describes the process for creating an EXI type grammar from an XML Schema
complex type definition
XS1
Given a complex type definition
, with properties {name}, {target namespace},
{attribute uses}, {attribute wildcard} and {content type},
create two EXI grammars
Type
and
TypeEmpty
following the procedure described below.
Generate a grammar
Attribute
, for each attribute use
in {attribute uses} according to section
8.5.4.1.4 Attribute Uses
Sort the attribute use grammars first by qname local-name, then by qname uri to form a sequence of grammars
, …,
n−1
, where
is the number of attribute uses in {attribute uses}.
If an {attribute wildcard} is specified, increment
and generate an additional attribute use grammar
n−1
as follows:
n−1, 0
EE
When the {attribute wildcard}'s {namespace constraint} is
any
, or
a pair of
not
and either a namespace name or the special value
absent
indicating no namespace,
add the following production to each grammar
generated above,
where 0 ≤
i, 0
AT (*)
i, 0
Otherwise, that is, when {namespace constraint} is a set of values whose members are namespace names or the special value
absent
indicating no namespace,
add the following production to each grammar
generated above
where 0 ≤
i, 0
AT(
uri
: *)
i, 0
where
uri
is a member value of {namespace constraint},
provided that it is the empty string (i.e. "") that is used as
uri
when the member value is the special value
absent
Each
uri
is used to augment the uri partition of the String table.
Section
7.3.1 String Table Partitions
describes how these
uri
strings are put into String table for pre-population.
Note:
When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other AT events of the same element, with xsi:type placed before xsi:nil when they both occur.
Semantics:
In complex type grammars,
all productions of the form
LeftHandSide
: AT (*)
RightHandSide
and
LeftHandSide
: AT(
uri
: *)
RightHandSide
that stem from attribute wildcards
are evaluated as follows:
Let
qname
be the
qname
of the attribute matched by AT (*) or AT(
uri
: *)
If
qname
is xsi:type, let
target-type
be the value of the xsi:type attribute and assign it the QName datatype representation (see
7.1.7 QName
). If there is no namespace in scope for the specified qname prefix, set the
uri
of
target-type
to empty ("") and the
localName
to the full lexical value of the QName, including the prefix. Encode
target-type
according to section
7. Representing Event Content
. If a grammar can be found for the
target-type
type using the encoded
target-type
representation, evaluate the element contents using the grammar for
target-type
type instead of
RightHandSide
If
qname
is xsi:nil, let
nil
be the value of the xsi:nil attribute. If
nil
is a valid Boolean value, assign it the Boolean datatype representation (see
7.1.2 Boolean
) and encode it according to section
7. Representing Event Content
. If
nil
is not a valid Boolean value, represent the
xsi:nil attribute event using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
). If the value of
nil
is true, evaluate the element contents using the grammar

for the
TypeEmpty
type defined below
rather than
RightHandSide
If a global attribute definition exists for
qname
, let
global-type
be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated with
global-type
, it SHOULD be represented using the datatype representation associated with
global-type
(see
7. Representing Event Content
). If the attribute value is not represented using the datatype representation associated with
global-type
, represent the
attribute event
using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
).
The grammar
TypeEmpty
is created by combining the sequence of attribute use grammars
terminated by an empty {content type} grammar
as follows:
TypeEmpty
⊕ … ⊕
n−1
Content
where the grammar
Content
is created as follows:
Content
i, 0
EE
The
content
index of grammar
TypeEmpty
is the index of its last non-terminal symbol.
The grammar
Type
is generated as follows.
If {content type} is a simple type definition
, generate a grammar
Content
as
Type
according to section
8.5.4.1.3.1 Simple Type Grammars
If {content type} has a content model particle, generate a grammar
Content
according to section
8.5.4.1.5 Particles
Otherwise, if {content type} is
empty
create a grammar
Content
as follows:
Content
EE
If {content type} is a content model particle with mixed content, add a production for each non-terminal
Content
i , j
in
Content
as follows:
Content
i, j
CH
[untyped value]
Content
i, j
Note:
The value of each Characters event that has an [untyped value] is represented as a String (see
7.1.10 String
).
Then, create a copy
of each attribute use grammar
and create the grammar
Type
by combining this sequence of attribute use grammars and the
Content
grammar using the grammar concatenation operator defined in section
8.5.4.1.1 Grammar Concatenation Operator
as follows:
Type
⊕ … ⊕
n−1
Content
The
content
index of grammar
Type
created from an XML Schema complex type definition is the index of the first non-terminal symbol of
Content
within the context of
Type
8.5.4.1.3.3 Complex Ur-Type Grammar
XML Schema
[XML Schema Structures]
defines a
complex ur-type
XS1
called
xsd:anyType
XS1
, which is the default type for declared elements when no type is specified in the declaration. The type xsd:anyType can be used as the type of declared elements in schemas, or as the explicit type given to elements by means of xsi:type attribute in
schema-informed EXI streams
When schemas are available to describe the body of an EXI stream, create
grammars
Type
ur-type
and
TypeEmpty
ur-type
that are used to process the element contents of type
xsd:anyType
XS1
as shown below.
Type
ur-type,
AT (*)
Type
ur-type,
SE(*)
Type
ur-type,
EE
CH
Type
ur-type,
Type
ur-type,
SE(*)
Type
ur-type,
EE
CH
Type
ur-type,
TypeEmpty
ur-type,
AT (*)
TypeEmpty
ur-type,
EE
TypeEmpty
ur-type,
EE
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: AT (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the attribute matched by AT (*)
If
qname
is xsi:type, let
target-type
be the value of the xsi:type attribute and assign it the QName datatype representation (see
7.1.7 QName
). If there is no namespace in scope for the specified qname prefix, set the
uri
of
target-type
to empty ("") and the
localName
to the full lexical value of the QName, including the prefix. Encode
target-type
according to section
7. Representing Event Content
. If a grammar can be found for the
target-type
type using the encoded
target-type
representation, evaluate the element contents using the grammar for
target-type
type instead of
RightHandSide
If
qname
is xsi:nil, let
nil
be the value of the xsi:nil attribute. If
nil
is a valid Boolean value, assign it the Boolean datatype representation (see
7.1.2 Boolean
) and encode it according to section
7. Representing Event Content
. If
nil
is not a valid Boolean value, represent the
xsi:nil attribute event using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
). If the value of
nil
is true, evaluate the element contents using the grammar

for the
TypeEmpty
ur-type
type defined above rather than
RightHandSide
If a global attribute definition exists for
qname
, let
global-type
be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated with
global-type
, it SHOULD be represented using the datatype representation associated with
global-type
(see
7. Representing Event Content
). If the attribute value is not represented using the datatype representation associated with
global-type
, represent the
attribute event
using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
).
In a schema-informed grammar, all productions of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of event sequence using
RightHandSide
The
content
index of
grammars
Type
ur-type
and
TypeEmpty
ur-type
are both 1 (one).
8.5.4.1.4 Attribute Uses
Given an attribute use
with properties {required} and {attribute declaration}, where {attribute declaration} has properties {name}, {target namespace}, {type definition} and {scope}, generate a new EXI grammar
Attribute
for evaluating attributes in the specified {scope} with qname local-name = {name} and qname uri = {target namespace}
where
qname
is the
qname
of the attributes.
Add the following grammar productions to
Attribute
Attribute
i, 0
AT(
qname
) [schema-typed value]
Attribute
i, 1
Attribute
i, 1
EE
If the {required} property of
is false, add the following grammar production to indicate this attribute occurrence may be omitted from the content model.
Attribute
i, 0
EE
Note:
Productions of the form
LeftHandSide
: AT(
qname
) [schema-typed value]
RightHandSide
represent typed attributes that occur in schema-valid contexts with values that can be represented using the EXI datatype representation associated with the attribute's {type definition} (see
7. Representing Event Content
).
Attributes that occur in schema-valid contexts that can be represented using the EXI datatype representation associated with the attribute's {type definition}, SHOULD be represented this way. Attributes that are not represented this way, are represented using the alternate forms of AT events described in section
8.5.4.4 Undeclared Productions
8.5.4.1.5 Particles
Given
an XML Schema
particle
XS1
with {min occurs}, {max occurs} and {term} properties, generate a grammar
Particle
for evaluating instances of
as follows.
If {term} is an element declaration, generate the grammar
Term
according to section
8.5.4.1.6 Element Terms
. If {term} is a wildcard, generate the grammar
Term
according to section
8.5.4.1.7 Wildcard Terms
Wildcard Terms. If {term} is a model group, generate the grammar
Term
according to section
8.5.4.1.8 Model Group Terms
Create {min occurs} copies of
Term
, …,
{min occurs}-1
If {max occurs} is not unbounded, create {max occurs} – {min occurs} additional copies of
Term
{min occurs}
{min occurs}+1
, …,
{max occurs}-1
Add the following productions to each of the grammars that do not already have a production of this form.
i, 0
EE where {min occurs} ≤
< {max occurs}
indicating these instances of
Term
may be omitted from the content model. Then, create the grammar for
Particle
using the grammar concatenation operator defined in section
8.5.4.1.1 Grammar Concatenation Operator
as follows:
Particle
⊕ … ⊕
{max occurs}-1
Otherwise, if {max occurs} is unbounded, generate one additional copy of
Term
{min occurs}
and replace all productions of the form:
{min occurs}, k
EE
with productions of the form:
{min occurs}, k
{min occurs}, 0
indicating this term may be repeated indefinitely. Then if there is no production of the form:
{min occurs}, 0
EE
add one after the other productions with the non-terminal
{min occurs}, 0
on the left-hand side, indicating this term may be omitted from the content model. Then, create the grammar for
Particle
using the grammar concatenation operator defined in section
8.5.4.1.1 Grammar Concatenation Operator
as follows:
Particle
⊕ … ⊕
{min occurs}
8.5.4.1.6 Element Terms
Given a particle {term}
PT
that is an XML Schema
element declaration
XS1
with properties {name} and {target namespace}, let
be the set of element declarations that directly or indirectly reaches the element declaration
PT
through the chain of {substitution group affiliation} property of the elements, plus
PT
itself if was not in the set. Sort the element declarations in
lexicographically first by {name} then by {target namespace}, which makes a sorted list of element declarations
, …
n−1
where
is the cardinality of
. Then create the grammar
ParticleTerm
with the following grammar productions:
Syntax:
ParticleTerm
i, 0
SE(
qname
ParticleTerm
i, 1
SE(
qname
ParticleTerm
i, 1
SE(
qname
n−1
ParticleTerm
i, 1
ParticleTerm
i, 1
EE
Note:
In the productions above,
qname
(where 0 ≤
< n) represents a
qname
of which local-name and uri are {name} property and {target namespace} property of the element declaration
, respectively.
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: SE(
qname
RightHandSide
are evaluated as follows:
Evaluate the element contents using the SE(
qname
) grammar.
Evaluate the remainder of the event sequence using
RightHandSide
8.5.4.1.7 Wildcard Terms
Given a particle {term}
PT
that is an XML Schema
wildcard
XS1
with property {namespace constraint}, a grammar that reflects the wildcard definition is created as follows.
Create a grammar
ParticleTerm
containing the following grammar production:
ParticleTerm
i, 1
EE
When the wildcard's {namespace constraint} is
any
, or a pair of
not
and either a namespace name or the special value
absent
indicating no namespace,
add the following production to
ParticleTerm
ParticleTerm
i, 0
SE(*)
ParticleTerm
i, 1
Otherwise, that is, when {namespace constraint} is a set of values whose members are namespace names or the special value
absent
indicating no namespace,

add the following production to
ParticleTerm
ParticleTerm
i, 0
SE(
uri
: *)
ParticleTerm
i, 1
for each member value
uri
in {namespace constraint},

provided that it is the empty string (i.e. "") that is used as
uri
when the member value is the special value
absent

Each
uri
is used to augment the uri partition of the String table.
Section
7.3.1 String Table Partitions
describes how these
uri
strings are put into String table for pre-population.
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: Terminal
RightHandSide
where Terminal is one of SE (*) or SE (
uri
: *) are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
or SE(
uri
: *)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of the event sequence using
RightHandSide
8.5.4.1.8 Model Group Terms
8.5.4.1.8.1 Sequence Model Groups
Given a particle {term}
PT
that is a model group with {compositor} equal to "sequence" and a list of
{particles}
, …,
n−1
, create a grammar
ParticleTerm
as follows:
If the value of
is 0, add the following productions to the grammar
ParticleTerm
ParticleTerm
i, 0
EE
Otherwise, generate a sequence of grammars
Particle
Particle
, …,
Particle
n−1
corresponding to the list of particles
, …,
n−1
according to section
8.5.4.1.5 Particles
. Then combine the sequence of grammars using the grammar concatenation operator defined in section
8.5.4.1.1 Grammar Concatenation Operator
as follows:
ParticleTerm
Particle
Particle
⊕ … ⊕
Particle
n−1
8.5.4.1.8.2 Choice Model Groups
Given a particle {term}
PT
that is a model group with {compositor} equal to "choice" and a list of
{particles}
, …,
n−1
, create a grammar
ParticleTerm
as follows:
If the value of
is 0, add the following productions to the grammar
ParticleTerm
ParticleTerm
i, 0
EE
Otherwise, generate a sequence of grammar productions
Particle
Particle
, …,
Particle
n−1
corresponding to the list of particles
, …,
n−1
according to section
8.5.4.1.5 Particles
. Then create the grammar
ParticleTerm
with the following grammar productions:
ParticleTerm
i, 0
Particle
0, 0
Particle
1, 0
Particle
n−1, 0
indicating the grammar for the term may accept any one of the given {particles}.
8.5.4.1.8.3 All Model Groups
Given a particle {term}
PT
that is a model group with {compositor} equal to "all" and a list of
{ particles }
, ...,
n−1
, create a grammar
ParticleTerm
as follows:
Add the following production to the grammar
ParticleTerm
ParticleTerm
i, 0
EE
If the value of
is not 0, generate a sequence of grammar productions
Particle
Particle
, …,
Particle
n−1
corresponding to the list of particles
, …,
n−1
according to section
8.5.4.1.5 Particles
Replace all productions of the form:
Particle
j , k
EE
with productions of the form:
Particle
j , k
ParticleTerm
i, 0
where 0 ≤
, and 0 ≤
with
denoting the number non-terminals in the grammar
Particle
Add the following productions to the grammar
ParticleTerm
ParticleTerm
i, 0
Particle
0, 0
Particle
1, 0
Particle
n−1, 0
Note:
The grammar above can accept any sequence of the given {particles} in any order. This grammar is intentionally simple and succinct, enabling high-performance, low-footprint implementations on a wide range of devices, including those with very limited memory resources. More elaborate and precise grammars for the "all" group are possible; however, the associated improvement in precision is not sufficient to justify their code-footprint and memory resource requirements.
8.5.4.2 EXI Normalized Grammars
This section describes the process for converting an EXI proto-grammar
generated
from an XML Schema in accordance with section
8.5.4.1 EXI Proto-Grammars
into an EXI normalized grammar. Each production in an EXI normalized grammar has exactly one non-terminal symbol on the left-hand side and one terminal symbol on the right-hand side followed by at most one non-terminal symbol on the right-hand side. In addition, EXI normalized grammars contain no two grammar productions with the same non-terminal on the left-hand side and the same terminal symbol on the right-hand side. This is a restricted form of Greibach normal form
[Greibach Normal Form]
EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:
LeftHandSide
RightHandSide
where
LeftHandSide
and
RightHandSide
are both non-terminals. Therefore, the first step of the normalization process focuses on replacing productions in this form with productions that conform to the EXI normalized grammar rules. This process can produce a grammar that has more than one production with the same non-terminal on the left-hand side and the same terminal symbol on the right-hand side. Therefore, the second step focuses on eliminating such productions.
The first step of the normalization process is described in Section
8.5.4.2.1 Eliminating Productions with no Terminal Symbol
. The second step is described in section
8.5.4.2.2 Eliminating Duplicate Terminal Symbols
. Once these two steps are completed, the grammar will be an EXI normalized grammar.
8.5.4.2.1 Eliminating Productions with no Terminal Symbol
Given an EXI proto-grammar
, with non-terminals
i, 0
i, 1
, …,
i, n−1
, replace each production of the form:
i, j
i, k
where 0 ≤
and 0 ≤
with a set of productions:
i, j
RHS
i, k
RHS
i, k
RHS
i, k
m-1
where
RHS
i, k
RHS
i, k
, …,
RHS
i, k
m-1
represents the right-hand side of each production in
that has the non-terminal
i, k
on the left-hand side and
is the number of such productions.
Remove such productions if any among
i, j
RHS
i, k
where 0 ≤
of which the right-hand side either is identical to the left-hand side, or has previously been replaced while applying the process described in this section to productions with
i, j
on the left-hand side.
Repeat this process until there are no more
productions
of the form:
i, j
i, k
where 0 ≤
and 0 ≤
in the grammar
8.5.4.2.2 Eliminating Duplicate Terminal Symbols
Given an EXI proto-grammar
, with non-terminals
i, 0
i, 1
, …,
i, n−1
, identify all pairs of productions that have the same non-terminal on the left-hand side and the same terminal symbol on the right-hand side of the form:
i, j
Terminal
i, k
Terminal
i, l
where
and Terminal represents a particular terminal symbol and replace them with a single production:
i, j
Terminal
i, k ⊔ l
where
i, k ⊔ l
is a distinct non-terminal that accepts the inputs accepted by
i, k
and the inputs accepted by
i, l

Here the notation " k ⊔ l " denotes a union set of integers and is used to uniquely identify the index of such a non-terminal.
When
is a type grammar, if both
and
are smaller than
content
index of
, k ⊔ l is also considered to be smaller than
content
for the purpose of index comparison purposes. Otherwise, if either
or
is not smaller than
content
, k ⊔ l is considered to be larger than
content
If the non-terminal
i, k ⊔ l
does not exist, create it as follows:
i, k ⊔ l
RHS
i, k
RHS
i, k
RHS
i, k
m-1
RHS
i, l
RHS
i, l
RHS
i, l
n−1
where
RHS
i, k
RHS
i, k
…,
RHS
i, k
m-1
and
RHS
i, l
RHS
i, l
…,
RHS
i, l
n−1
represent the right-hand side of each production in the Grammar
that has the non-terminals
j, k
and
j, l
on the left-hand side respectively and
and
are the number of such productions.
Repeat this process until there are no more productions in the grammar
of the form:
i, j
Terminal
i, k
Terminal
i, l
Then, identify any identical productions of the following form:
i, j
Terminal
i, k
Terminal
i, k
where 0 ≤
is the number of productions in
and Terminal represents a specific terminal symbol, then remove one of them until there are no more productions remaining in the grammar
of this form.
8.5.4.3 Event Code Assignment
This section describes the process for assigning unique
event codes
to each production in a normalized EXI grammar. Given a normalized EXI grammar
, apply the following process to each unique non-terminal
i, j
that occurs on the left-hand side of the productions in
where 0 ≤
and
is the number of such non-terminals in
Sort all productions with
i, j
on the left-hand side in the following order:
all productions with AT(
qname
) on the right-hand side
sorted lexicographically by
qname
local-name, then by
qname
uri, followed by
all productions with AT(
uri
: *) on the right-hand side sorted lexicographically by
uri
, followed by
any production with AT (*) on the right-hand side, followed by
all productions with SE(
qname
) on the right-hand side sorted in schema order, followed by
all productions with SE(
uri
: *) on the right-hand side sorted in schema order, followed by
any production with SE(*) on the right-hand side, followed by
any production with EE on the right-hand side, followed by
any production with CH on the right-hand side.
In step 4 and step 5, the schema order of productions with SE(
qname
) and SE(
uri
: *) on the right-hand side is determined by the order of the corresponding particles in the schema after any references to named model groups in the schema are expanded in place with the group definitions themselves. A content model of a complex type can be seen as a tree that consists of particles where particles of either element declaration terms or wildcard terms appear as leaves, and the order is assigned to those leaf particles by traversing the tree by depth-first method.
Given the sorted list of productions
, …
with the non-terminal
i, j
on the left-hand side, assign
event codes
to each of the productions as follows:
Productions
Event Code
n−1
−1
8.5.4.4 Undeclared Productions
The normalized element and type grammars
generated
from a schema describe the sequences of child elements, attributes and character events that may occur in a particular EXI stream. However, there are additional events that may occur in an EXI stream that are not described by the schema, for example events representing comments, processing-instructions, schema deviations, etc.
This section first describes the process for, in cases with
strict option
value set to false, augmenting the normalized element and type grammars with productions that describe events that may occur in the EXI stream, but are not explicitly declared in the schema. It then describes the way, in cases with
strict option
value set to true, normalized element and type grammars are supplemented with productions to be prepared for the occurrences of xsi:type and xsi:nil attributes that are permitted by the schema.
In the normalized grammars, terminal symbols AT and CH represent attribute and character events that can be represented by the EXI datatype representations associated with their schema datatypes (see
7. Representing Event Content
).
When the
strict option
is false, additional untyped AT and CH terminal symbols are added that can be used for representing attributes and character events that cannot be represented by the associated EXI datatype representations (e.g., schema-invalid values). The following table shows the notation used for such AT and CH terminals along with their definitions.
Notation
Definition
AT (
qname
) [untyped value]
Terminal symbol that matches an attribute

event with
qname
qname
and an untyped value.
AT (*) [untyped value]
Terminal symbol that matches an attribute

event with any
qname
and an untyped value.
CH [untyped value]
Terminal symbol that matches a characters

event with an untyped value.
8.5.4.4.1 Adding Productions when Strict is False
This section describes the process for augmenting the normalized grammars when the value of the
strict option
is false. For each normalized element grammar
Element
, create a copy
Element
i, content2
of
Element
i, content
where the index "content" is the
content
of the type of the element from which
Element
was created. Then, apply the following procedures.
Add the following production to each non-terminal
Element
i, j
that does not already include a production of the form
Element
i, j
: EE, such that 0 ≤ j ≤ content.
Syntax
Event Code
Element
i, j
EE
where
represents the next available
event code
with length 2.
Let
be the element declaration from which
Element
was created and
be the {type definition} of
. Let
Type
and
TypeEmpty
be the type grammars created from
(see section
8.5.4.1.3 Type Grammars
). Add the following productions to
Element
Syntax
Event Code
Element
i, 0
AT(xsi:type)
Element
i, 0
AT(xsi:nil)
Element
i, 0
.(
+1)
where
represents the next available
event code
with length 2.
Note:
When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other
attribute
events of the same element, with xsi:type placed before xsi:nil when they both occur.
Semantics:
When using schemas, all productions of the form
LeftHandSide
: AT (xsi:type)
RightHandSide
are evaluated as follows:
Let
target-type
be the value of the xsi:type attribute and assign it the QName datatype representation (see
7.1.7 QName
).
If there is no namespace in scope for the specified
qname
prefix, set the
uri
of
target-type
to empty ("") and the
localName
to the full lexical value of the QName, including the prefix.
Represent the value of
target-type
according to section
7. Representing Event Content
If a grammar can be found for the
target-type
type using the encoded
target-type
representation,
evaluate the element contents using the grammar for
target-type
type instead of
RightHandSide
When using schemas, productions of the form
LeftHandSide
: AT (xsi:nil)
RightHandSide
are evaluated as follows:
Let
nil
be the value of the xsi:nil attribute.
If
nil
is a valid Boolean, assign it the Boolean datatype representation (see
7.1.2 Boolean
) and encode it according to section
7. Representing Event Content
. If
nil
is not a valid Boolean, represent the
xsi:nil attribute
event using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
).
If the value of
nil
is true, evaluate the element contents using the grammar
TypeEmpty
defined above rather than
RightHandSide
For each non-terminal
Element
i, j
, such that 0 ≤ j ≤ content , with zero or more productions of the following form:
Element
i, j
AT (
qname
) [schema-typed value]
NonTerminal
AT (
qname
) [schema-typed value]
NonTerminal
AT (
qname
-1
) [schema-typed value]
NonTerminal
x-1
where
represents the number of attributes declared in the schema for this context, add the following productions:
Syntax
Event Code
Element
i, j
AT (*)
Element
i, j
AT (
qname
) [untyped value]
NonTerminal
.(
+1).0
AT (
qname
) [untyped value]
NonTerminal
.(
+1).1
AT (
qname
-1
) [untyped value]
NonTerminal
x-1
.(
+1).(
-1)
AT (*) [untyped value]
Element
i, j
.(
+1).(
where
represents the next available
event code
with length 2.
Note:
The value of each
attribute
event that has an [untyped value] is represented as a String (see
7.1.10 String
).
Like an element, an attribute may occur in a schema-invalid context, have a untyped (e.g., schema-invalid) value or both. However, unlike an element whose occurrence and value are represented by separate SE and CH events, the occurrence and value of an attribute are represented by a single AT event. Consequently, four kinds of AT
terminal symbols
are needed for the four possible representations of an attribute event
in schema-informed grammars.
The table below shows these four kinds of AT terminal symbols
along with the equivalent combinations of SE and CH
terminal symbols
for representing elements.
Table 8-1.
Equivalent terminal symbols
for different attribute and element representations
Schema-typed value
Untyped value
Schema-valid occurrence
AT (
qname
) [schema-typed value]
SE (
qname
) CH [schema-typed value]
AT (
qname
) [untyped value]
SE (
qname
) CH [untyped value]
Schema-invalid occurrence
AT (*)
SE (*) CH [schema-typed value]
AT (*) [untyped value]
SE (*) CH [untyped value]
When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other
attribute
events of the same element, with xsi:type placed before xsi:nil when they both occur.
Semantics:
In a schema-informed grammar,
all productions of the form
LeftHandSide
: AT (*) are evaluated as follows:
Let
qname
be the
qname
of the attribute matched by AT (*)
If
qname
is xsi:type, let
target-type
be the value of the xsi:type attribute and assign it the QName datatype representation (see
7.1.7 QName
). If there is no namespace in scope for the specified qname prefix, set the
uri
of
target-type
to empty ("") and the
localName
to the full lexical value of the QName, including the prefix. Encode
target-type
according to section
7. Representing Event Content
. If a grammar can be found for the
target-type
type using the encoded
target-type
representation, evaluate the element contents using the grammar for
target-type
type instead of
RightHandSide
If
qname
is xsi:nil, let
nil
be the value of the xsi:nil attribute. If
nil
is a valid Boolean value, assign it the Boolean datatype representation (see
7.1.2 Boolean
) and encode it according to section
7. Representing Event Content
. If
nil
is not a valid Boolean value, represent the
xsi:nil attribute event using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
). If the value of
nil
is true, evaluate the element contents using the grammar

for the
TypeEmpty
type defined above rather than
RightHandSide
If a global attribute definition exists for
qname
, let
global-type
be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated with
global-type
, it SHOULD be represented using the datatype representation associated with
global-type
(see
7. Representing Event Content
). If the attribute value is not represented using the datatype representation associated with
global-type
, represent the
attribute event
using the AT (*) [untyped value] terminal (see
8.5.4.4 Undeclared Productions
).
Add the following production to
Element
Syntax
Event Code
Element
i, 0
NS
Element
i, 0
where
represents the next available
event code
with length 2.
When the value of the
selfContained option
is true, add the following production to
Element
Syntax
Event Code
Element
i, 0
SC
Fragment
where
represents the next available
event code
with length 2.
Semantics:
All productions of the form
LeftHandSide
: SC
Fragment
are evaluated as follows:
Save the string table, grammars and any implementation-specific state learned while processing this EXI Body.
Initialize the string table, grammars and any implementation-specific state learned while processing this EXI Body to the state they held just prior to processing this EXI Body.
Skip to the next byte-aligned boundary in the stream
if it is not already at such a boundary.
Let
qname
be the
qname
of the SE event immediately preceding this SC event.
Let
content
be the sequence of events following this SC event that match the grammar for element
qname
, up to and including the terminating EE event.
Evaluate the sequence of events (SD, SE(
qname
),
content
, ED) according to the
Fragment
grammar. (see
8.5.2 Schema-informed Fragment Grammar
Skip to the next byte-aligned boundary in the stream
if it is not already at such a boundary.
Restore the string table, grammars and implementation-specific state learned while processing this EXI Body to that saved in step 1 above.
Add the following productions to each non-terminal
Element
i, j
, such that 0 ≤ j ≤ content .
Syntax
Event Code
Element
i, j
SE (*)
Element
i, content2
CH [untyped value]
Element
i, content2
.(
+1)
ER
Element
i, content2
.(
+2)
CM
Element
i, content2
.(
+3).0
PI
Element
i, content2
.(
+3).1
where
represents the next available
event code
with length 2.
Note:
Productions of the form
LeftHandSide
: CH [untyped value]
RightHandSide
match untyped character data represented as a String in the EXI stream (e.g., schema-invalid values). Character data represented using the datatype representation associated with the schema datatype of the character data are matched by productions of the form
LeftHandSide
: CH [schema-typed value]
RightHandSide
described in section
8.5.4.1.3.1 Simple Type Grammars
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of the event sequence using
RightHandSide
Add the following production to
Element
i, content2
and to each non-terminal
Element
i, j
that does not already include a production of the form
Element
i, j
: EE, such that content <
, where
is the number of non-terminals in
Element
Syntax
Event Code
Element
i, j
EE
where
represents the next available
event code
with length 2.
Add the following productions to
Element
i, content2
and to each non-terminal
Element
i, j
, such that content <
, where
is the number of non-terminals in
Element
Syntax
Event Code
Element
i, j
SE (*)
Element
i, j
CH [untyped value]
Element
i, j
.(
+1)
ER
Element
i, j
.(
+2)
CM
Element
i, j
.(
+3).0
PI
Element
i, j
.(
+3).1
where
represents the next available
event code
with length 2.
Semantics:
In a schema-informed grammar, all productions of the form
LeftHandSide
: SE (*)
RightHandSide
are evaluated as follows:
Let
qname
be the
qname
of the element matched by SE (*)
If a
global element grammar
does not exist for element
qname
, create one according to section
8.4.3 Built-in Element Grammar
Evaluate the element content using the
global element grammar
for element
qname
Evaluate the remainder of the event sequence using
RightHandSide
Apply the process described above for element grammars to each normalized type grammar
Type
and
TypeEmpty
8.5.4.4.2 Adding Productions when Strict is True
This section describes the process for augmenting the normalized grammars when the value of the
strict option
is true. For each normalized element grammar
Element
, apply the following procedures.
Let
be the element declaration from which
Element
was created and
be the {type definition} of
. If
either has named sub-types or is a simple type definition of which {variety} is
union
add the following production to
Element
Syntax
Event Code
Element
i, 0
AT(xsi:type)
Element
i, 0
where
represents the next available
event code
with length 2.
Semantics:
When using schemas, productions of the form
LeftHandSide
: AT (xsi:type)
RightHandSide
are evaluated as follows:
Let
target-type
be the value of the xsi:type attribute and assign it the QName datatype representation (see
7.1.7 QName
).
If there is no namespace in scope for the specified
qname
prefix, set the
uri
of
target-type
to empty ("") and the
localName
to the full lexical value of the QName, including the prefix.
Represent the value of
target-type
according to section
7. Representing Event Content
If a grammar can be found for the
target-type
type using the encoded
target-type
representation,
evaluate the element contents using the grammar for
target-type
type instead of
RightHandSide
Let
Type
and
TypeEmpty
be the type grammars created from
(see section
8.5.4.1.3 Type Grammars
). If the {nillable} property of
is true, add the following production to
Element
Syntax
Event Code
Element
i, 0
AT(xsi:nil)
Element
i, 0
where
represents the next available
event code
with length 2.
Semantics:
When using schemas, productions of the form
LeftHandSide
: AT (xsi:nil)
RightHandSide
are evaluated as follows:
Let
nil
be the value of the xsi:nil attribute.
If
nil
is a valid Boolean, assign it the Boolean datatype representation (see
7.1.2 Boolean
) and encode it according to section
7. Representing Event Content
. Otherwise, the value of the
nil
is schema-invalid and cannot be represented when the value of the
strict option
is true.
If the value of
nil
is true, evaluate the element contents using the grammar
TypeEmpty
defined above rather than
RightHandSide
Note:
When the value of the
strict option
is true, it is not possible to use xsi:type and xsi:nil attributes together on the same element
unless there is a wildcard attribute declared for the element.
This is due to the fact that xsi:type specifies a target type definition, but xsi:nil is only permitted on nillable elements, not type definitions.
When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other
attribute
events of the same element, with xsi:type placed before xsi:nil when they both occur.
9. EXI Compression
The use of EXI compression increases compactness utilizing additional computational resources. EXI compression combines knowledge of XML with a widely adopted, standard compression algorithm to achieve higher compression ratios than would be achievable by applying compression to the entire stream.
EXI compression is applied when
compression
is turned on or when
alignment
is set to
pre-compression
. Byte-aligned representations of
event codes
and
content items
are more amenable to compression algorithms compared to unaligned representations because most compression algorithms operate on series of bytes to identify redundancies in the octets. Therefore, when EXI compression is used,
event codes
and
content items
of EXI events are encoded as aligned bytes in accordance with
6.2 Representing Event Codes
and
7. Representing Event Content
EXI compression splits a sequence of EXI events into a number of contiguous blocks of events.
Events that belong to the same block are transformed into lower entropy groups of similar values called
channels
, which are individually well suited for standard compression algorithms. To reduce compression overhead, smaller channels are combined before compressing them, while larger channels are compressed independently. The criteria EXI compression uses to define and combine channels is intentionally simple to facilitate implementation, reduce processing overhead, and avoid the need to encode channel ordering or grouping information in the format. The figure below presents a schematic view of the steps involved in EXI compression.
Figure 9-1.
EXI Compression Overview
In the following sections,
9.1 Blocks
defines blocks and explains how EXI events are partitioned into blocks.
Section
9.2 Channels
defines channels, their organization as well as how a group of channels correlate to its corresponding block of events.
Section
9.3 Compressed Streams
describes how some channels are combined as needed in preparation for applying compression algorithms on channels.
9.1 Blocks
EXI compression partitions the sequence of EXI events into a sequence of one or more non-overlapping blocks. Each block preceding the final block contains the minimum set of consecutive events that result in exactly
blockSize
values
in its value channels (see
9.2.2 Value Channels
), where blockSize is the block size of the EXI stream (see
5.4 EXI Options
). The final block contains no more than blockSize
values
in its value channels.
9.2 Channels
Events inside each block are multiplexed into channels. The first channel of each block is the structure channel described in Section
9.2.1 Structure Channel
. The remaining channels in each block are value channels described in Section
9.2.2 Value Channels
The diagram below presents an exemplary view of the transformation in which events within a block are multiplexed into channels in one way and channels are demultiplexed into events in the other way.
Figure 9-2.
Multiplexing EXI events into channels
9.2.1 Structure Channel
The structure channel of each block defines the overall order and structure of the events in that block. It contains the
event codes
and associated content for each event in the block, except for Attribute (AT) and Character (CH)
values
which are stored in the value channels. In addition, there are two kinds of attribute events whose
values
are stored in the structure channel instead of in value channels. The
value
of each xsi:type attribute is stored in the structure channel. The
value
of each xsi:nil attribute that matches a schema-informed grammar production
and has a schema-valid value
is also stored in the structure channel. These attribute events are intrinsic to the grammar system thus are essential in processing the structure channel because their values affect the grammar to be used for processing the rest of the elements on which they appear. All
event codes
and content in the structure stream occur in the same order as they occur in the EXI event sequence.
9.2.2 Value Channels
The
values
of the Attribute (AT) and Character (CH) events in each block are organized into separate channels based on the
qname
of the associated attribute or element. Specifically, the
value
of each Attribute (AT) event is placed in the channel identified by the
qname
of the Attribute and the
value
of each Character (CH) event is placed in the channel identified by the
qname
of its parent Start Element (SE) event. Each block contains exactly one channel for each distinct element or attribute
qname
that occurs in the block. The
values
in each channel occur in the order they occur in the EXI event sequence.
9.3 Compressed Streams
The channels in a block are further organized into compressed streams. Smaller channels are combined into the same compressed stream, while others are each compressed separately. Below are the rules applied within the scope of a block used to determine the channels to be combined together, the order of the compressed streams and the order amongst the channels that are combined into the same compressed stream.
If the block contains at most 100
values
, the block will contain only 1 compressed stream containing the structure channel followed by all of the value channels. The order of the value channels within the compressed stream is defined by the order in which the first
value
in each channel occurs in the EXI event sequence.
If the block contains more than 100
values
, the first compressed stream contains only the structure channel. The second compressed stream contains all value channels that contain at most 100
values
. And the remaining compressed streams each contain only one channel, each having more than 100
values
. The order of the value channels within the second compressed stream is defined by the order in which the first
value
in each channel occurs in the EXI event sequence. Similarly, the order of the compressed streams following the second compressed stream in the block is defined by the order in which the first
value
of the channel inside each compressed stream occurs in the EXI event sequence.
Note:
EXI compression changes the order in which
event codes
and
value
s are read and written to and from an EXI stream.
EXI processors
must encode and decode
value
s in this revised order so order sensitive constructs like the string table (see
7.3 String Table
) work properly.
When the value of the
compression
option is set to true, each compressed stream in a block is stored using the standard DEFLATE Compressed Data Format defined by RFC 1951
[IETF RFC 1951]
. Otherwise, when the value of the
alignment
option is set to
pre-compression
, each compressed stream in a block is stored directly without the DEFLATE algorithm.
10. Conformance
10.1 EXI Stream Conformance
[Definition:]
conformant EXI stream
consists of a sequence of octets that follows the syntax of
EXI stream
that is defined in this document.
[Definition:]
EXI format provides a way to involve user-defined datatype representations in EXI streams processing, which is an extension point that, when used in conjunction with relevant datatype representations specifications external to this document, leads to the formulation of
Extended EXI streams
Conformance of extended EXI streams is relative to the syntax defined by the relevant user-defined datatype representations specifications. The definitions of user-defined datatype representations syntax are out of the scope of this document.
[Definition:]
An extended EXI stream is a
conformant extended EXI stream
if replacing value items represented using user-defined datatype representations with their intrinsic representations would make the stream a
conformant EXI stream
An extended EXI stream described as an "EXI stream with regards to datatype representations
" where
is the set of datatype representations can be processed by an
EXI stream decoder
only if the processor has the shared knowledge about each one of the datatype representations in the set
The structural syntax of
EXI streams
and
extended EXI streams
is described by the abstract EXI grammar system defined in this document. Although this document specifies the normative way in which XML Schema schemas are mapped into the EXI grammar system to make schema-informed grammars, EXI allows the use of other schema languages to process EXI streams or extended EXI streams so far as there is a well known EXI grammar binding of the schema language and the binding preserves the semantics of the EXI grammar system. EXI streams or extended EXI streams generated using schemas of such schema language are still conformant. The definitions of grammar bindings for schema languages other than XML Schema are out of the scope of this document, and each schema language community is encouraged to define its own binding in order to make it possible to harness the utmost efficiency out of EXI when schemas of the language are available.
10.2 EXI Processor Conformance
The conformance of EXI Processors is defined separately for each of the two processor roles,
EXI stream encoders
and
EXI stream decoders
; the conformance of the former is described in terms of the conformance of the
EXI streams
or
extended EXI streams
that they produce, while that of the latter is based on the set of format features that EXI stream decoders are prepared
for in the processing of
conformant EXI streams
or
conformant extended EXI streams
An
EXI stream encoder
is conformant if and only if it is capable of generating
conformant EXI streams
or
conformant extended EXI streams
given any input structured data it is made to work on.
On the other hand,
EXI stream decoders
MUST support all format features described in this document as they are explained, except for the capability of handling
Datatype Representation Map
which is an optional feature.
EXI stream decoders that do not implement
Datatype Representation Map
feature MUST report an error with a meaningful message upon encountering a
"datatypeRepresentationMap"
element while processing
EXI options documents
in
EXI headers
A References
A.1 Normative References
IETF RFC 1951
DEFLATE Compressed Data Format Specification version 1.3
, P. Deutsch, Author. Internet
Engineering Task Force, May 1996. Available at
IETF RFC 2119
Key words for use in RFCs to Indicate
Requirement Levels
, S. Bradner, Author. Internet
Engineering Task Force, June 1999. Available at
IETF RFC 3023
XML Media Types
M. Murata, S. St.Laurent and D. Kohn, Author. Internet
Engineering Task Force, January 2001. Available at
ISO/IEC 10646
ISO/IEC 10646-1:2000. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane
and
ISO/IEC 10646-2:2001. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 2: Supplementary Planes
, as, from time to time, amended, replaced by a new edition or expanded by the addition of new parts. [Geneva]: International Organization for Standardization. (See
for the latest version.)
XML 1.0
Extensible Markup Language (XML) 1.0 (Fifth Edition)
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, Editors.
World Wide Web Consortium, 10 February 1998, revised 26 November 2008.
This version is http://www.w3.org/TR/2008/REC-xml-20081126.
The latest version is available at
XML 1.1
Extensible Markup Language (XML) 1.1 (Second Edition)
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau, and J. Cowan, Editors.
World Wide Web Consortium, 04 February 2004, revised 16 August 2006.
This version is http://www.w3.org/TR/2006/REC-xml11-20060816.
The latest version is available at
Namespaces in XML
Namespaces in XML 1.0 (Second Edition)
T. Bray, D. Hollander, A. Layman, and R. Tobin, Editors.
World Wide Web Consortium, 14 January 1999, revised 16 August 2006.
This version is http://www.w3.org/TR/2006/REC-xml-names-20060816 .
The latest version is available at
XML Information Set
XML Information Set (Second Edition)
J. Cowan and R. Tobin, Editors. World Wide Web Consortium,
24 October 2001, revised 4 February 2004.
This version is http://www.w3.org/TR/2004/REC-xml-infoset-20040204.
The latest version is available at
XML Schema Structures
XML Schema Part 1: Structures Second
Edition
, H. Thompson, D. Beech, M. Maloney, and
N. Mendelsohn, Editors. World Wide Web Consortium, 2 May
2001, revised 28 October 2004.
This version is http://www.w3.org/TR/2004/REC-xmlschema-1-20041028.
The latest version is available at
XML Schema Datatypes
XML Schema Part 2: Datatypes Second
Edition
, P. Byron and A. Malhotra,
Editors. World Wide Web Consortium, 2 May 2001, revised 28
October 2004.
This version is http://www.w3.org/TR/2004/REC-xmlschema-2-20041028.
The latest version is available at
A.2 Other References
Efficient XML
Efficient XML
, part of
[EXI Measurements Note]
independently referenced.
The latest version is available at
EXI Evaluation Note
Efficient XML Interchange Evaluation
Carine Bournez, Editor.
World Wide Web Consortium.
The latest version is available at
EXI Impacts Note
Efficient XML Interchange (EXI) Impacts
Jaakko Kangasharju, Editor.
World Wide Web Consortium.
The latest version is available at
EXI Measurements Note
Efficient XML Interchange Measurements Note
Greg White, Jaakko Kangasharju, Don Brutzman and Stephen Williams, Editors.
World Wide Web Consortium.
The latest version is available at
EXI Primer
Efficient XML Interchange (EXI) Primer
Daniel Peintner, Santiago Pericas-Geertsen, Editors.
World Wide Web Consortium.
The latest version is available at
Greibach Normal Form
A New Normal-Form Theorem for Context-Free Phrase Structure Grammars
Sheila A. Greibach, Author.
Journal of the ACM Volume 12 Issue 1, January 1965, pp. 42–52.
Huffman Coding
A Method for the Construction of
Minimum-Redundancy Codes
, D. A. Huffman,
Author. Proceedings of the I.R.E., September 1952, pp.
1098-1102.
IEEE 754-2008
IEEE Standard for Floating-Point Arithmetic
ISO/IEC 19757-2:2003
Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG
SOAP 1.2
SOAP Version 1.2 Part 1: Messaging
Framework
, M. Gudgin, M. Hadley, N. Mendelsohn,
J-J. Moreau, H. Frystyk Nielsen, Editors. World Wide Web
Consortium, 24 June 2003.
This version is http://www.w3.org/TR/2003/REC-soap12-part1-20030624/.
The latest version is available at
XBC Measurement Methodologies
XML Binary Characterization Measurement
Methodologies
, S. D. Williams and P. Haggar,
Editors. World Wide Web Consortium, 31 March 2005.
This version is http://www.w3.org/TR/2005/NOTE-xbc-measurement-20050331/.
The latest version is available at
XBC Use Cases
XML Binary Characterization Use Cases
Mike Cokus and Santiago Pericas-Geertsen, Editors.
World Wide Web Consortium, 31 March 2005.
This version is http://www.w3.org/TR/2005/NOTE-xbc-use-cases-20050331/.
The latest version is available at
XBC Properties
XML Binary Characterization Properties
Mike Cokus and Santiago Pericas-Geertsen, Editors.
World Wide Web Consortium, 31 March 2005.
This version is http://www.w3.org/TR/2005/NOTE-xbc-properties-20050331/
The latest version is available at
B Infoset Mapping
This appendix contains the mappings between the XML Information
Set
[XML Information Set]
model and the EXI format.
Starting from the document information item,
each
information item
definition is mapped to its respective
unordered set of EXI event types
(see
Table 4-1
).
The actual order amongst information set items when it is relevant reflects the occurrence order of EXI events or their references in an EXI stream that correlate to the infoset items. As used in the XML Information
Set specification, the Infoset property names are shown in square
brackets,
[thus]
Note:
As has been prescribed in section
2. Design Principles
, EXI is designed to be compatible with the XML Information Set. While this approach is both legitimate and practical for designing a succinct format interoperable with XML family of specifications and technologies, it entails that some lexical constructs of XML not recognized by the XML Information Set are not represented by EXI, either. Examples of such unrepresented lexical constructs of XML include white space outside the document element, white space within tags, the kind of quotation marks (single or double) used to quote attribute values, and the boundaries of CDATA marked sections.
No constructs in EXI format facilitate the representation of
[character encoding scheme]
[standalone]
and
[version]
properties which are available in the definition of Document Information Item of XML Information Set
(see
B.1 Document Information Item
). EXI is made agnostic about
[character encoding scheme]
and
[version]
properties as they are in XML Information Set, and considers them to be the properties of
XML serializers in use. EXI forgoes
[standalone]
property
because simply having no references to any external markup declarations practically
serves the purpose with less complexity.
B.1 Document Information Item
A document information item maps to a pair of
Start Document (SD)
and
End Document (ED) events
with each of its properties subject to further mapping as shown in the following table.
Table B-1. Mapping between the document information item properties to EXI event types
Property
EXI event types
[children]
CM* PI* DT? [SE, EE]
[document element]
[SE, EE]
[notations]
Computed based on
text
content item of DT
to which each notation information set item maps.
[unparsed entities]
Computed based on
text
content item of DT
to which each unparsed entity information set item maps.
[base URI]
The base URI of the EXI stream
[character encoding scheme]
N/A
[standalone]
Not available
[version]
Not available
[all declarations processed]
True if all declarations contained directly or indirectly in DT are processed, otherwise false, which is the processor quality as opposed to the information provided by the format.
B.2 Element Information Items
An element information item maps to a pair of a
Start Element (SE)
event and the corresponding
End Element (EE)
event with each of its properties subject to further mapping as shown in the following table.
Table B-2. Mapping of the element information item properties to EXI event types
Property
EXI event types
[namespace name]
SE
[local name]
SE
[prefix]
SE
[children]
[SE, EE]* PI* CM* CH* ER*
[attributes]
AT*
[namespace attributes]
NS*
[in-scope namespaces]
The namespace information items computed using the
[namespace attributes]
properties of this information item and its ancestors
[base URI]
The base URI of the element information item
[parent]
Computed based on the last SE event encountered that did
not get a matching EE event if any, or computed based on the SD event
B.3 Attribute Information Item
An attribute information item maps to an
Attribute (AT)
event with each of its properties subject to further mapping as shown in the following table.
Table B-3. Mapping of the attribute information item properties to EXI event types
Property
EXI event types
[namespace name]
AT
[local name]
AT
[prefix]
AT
[normalized value]
The
value
of AT
[specified]
True if the item maps to AT, otherwise false
[attribute type]
Computed based on AT and DT
[references]
Computed based on
[attribute type]
and
value
of AT
[owner element]
Computed based on the last SE event encountered that did
not get a matching EE event
B.4 Processing Instruction Information Item
A processing instruction information maps to a
Processing Instruction (PI)
event with each of its properties subject to further mapping as shown in the following table.
Table B-4. Mapping of the processing instruction information item properties to EXI event types
Property
EXI event types
[target]
PI
[content]
PI
[base URI]
The base URI of the processing information item
[notation]
Computed based on the availability of the internal DTD subset
[parent]
Computed based on the last SE event encountered that did
not get a matching EE event type
B.5 Unexpanded Entity Reference Information item
An unexpanded entity reference information item maps to an
Entity Reference (ER) event
with each of its properties subject to further mapping as shown in the following table.
Table B-5. Mapping of the entity reference information item properties to
the EXI event types
Property
EXI event types
[name]
ER
[system identifier]
Based on the availability of the internal DTD subset
[public identifier]
Based on the availability of the internal DTD subset
[declaration base URI]
The base URI of the unexpanded entity reference information item
[parent]
Computed based on the last SE event encountered that did
not get a matching EE event type
B.6 Character Information item
A character information item maps to the individual characters contained in a
Characters (CH)
event following a SE event that did not get a matching EE event.
Table B-6. Mapping of the character information item properties and the EXI event types
Property
EXI event types
[character code]
Each character in CH
[element content whitespace]
Computed based on
[parent]
and DT
[parent]
Computed based on the last SE event encountered that did
not get a matching EE event
B.7 Comment Information item
A comment information item maps to a
Comment (CM)
event with each of its properties subject to further mapping as shown in the following table.
Table B-7. Mapping of the comment information item properties and the EXI event types
Property
EXI event types
[content]
text
content item of CM
[parent]
Computed based on the last SE event encountered that did
not get a matching EE event, or the SD event
B.8 Document Type Declaration Information item
A document type declaration information item maps to a
DOCTYPE (DT)
event with each of its properties subject to further mapping as shown in the following table.
Table B-8. Mapping of the document type declaration information item properties to the EXI event types
Property
EXI event types
[system identifier]
DT
[public identifier]
DT
[children]
Computed based on
text
content item of DT
[parent]
Computed based on the SD event
B.9 Unparsed Entity Information Item
An unparsed entity information item maps to part of the
text
content item of
DOCTYPE (DT)
event with each of its properties subject to further mapping as shown in the following table.
Table B-9. Mapping of the unparsed entity information item properties to EXI event types
Property
EXI event types
[name]
Computed based on
text
content item of DT
[system identifier]
Computed based on
text
content item of DT
[public identifier]
Computed based on
text
content item of DT
[declaration base URI]
The base URI of the unparsed entity information item
[notation name]
Computed based on
text
content item of DT
[notation]
Computed based on
text
content item of DT
B.10 Notation Information Item
An notation information item maps to part of the
text
content item of
DOCTYPE (DT)
event with each of its properties subject to further mapping as shown in the following table.
Table B-10. Mapping of the notation information item properties to EXI event types
Property
EXI event types
[name]
Computed based on
text
content item of DT
[system identifier]
Computed based on
text
content item of DT
[public identifier]
Computed based on
text
content item of DT
[declaration base URI]
The base URI of the notation information item
B.11 Namespace Information Item
An namespace information item
maps to a Namespace Declaration (NS)
event with each of its properties subject to further mapping as shown in the following table.
Table B-11. Mapping of the namespace information item properties to EXI event types
Property
EXI event types
[prefix]
NS
[namespace name]
NS
C XML Schema for EXI Options Document
The following schema describes the EXI options header. It is
designed to produce smaller headers for option combinations used when
compactness is critical.
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">

processContents="skip" />

minOccurs="0" maxOccurs="unbounded">

Note:
The
qnames
exi:ieeeBinary32 and exi:ieeeBinary64 defined above are reserved for future use in Datatype Representation Maps to identify the 32-bit and 64-bit Binary Interchange Formats defined by the IEEE 754-2008 standard
[IEEE 754-2008]
D Initial Entries in String Table Partitions
D.1 Initial Entries in Uri Partition
The following table lists the entries that are initially populated in uri partitions, where partition name URI denotes that they are entries in the uri partition.
Table D-1. Initial values in
uri
partition
Partition
String ID
String Value
URI
"" [empty string]
URI
"http://www.w3.org/XML/1998/namespace"
URI
"http://www.w3.org/2001/XMLSchema-instance"
When XML Schemas are used to inform the grammars for processing EXI body, there is an additional entry that is appended to the uri partition.
Table D-2. Additional entry when XML Schemas are used
Partition
String ID
String Value
URI
"http://www.w3.org/2001/XMLSchema"
D.2 Initial Entries in Prefix Partitions
The following table lists the entries that are initially populated in prefix partitions,
where XML-PF represents the partition for
prefixes
in
the
"http://www.w3.org/XML/1998/namespace"
namespace and XSI-PF
represents the partition for
prefixes
in the
"http://www.w3.org/2001/XMLSchema-instance"
namespace.
Table D-3. Initial
prefix
string table entries
Partition
String ID
String Value
""
"" [empty string]
XML-PF
"xml"
XSI-PF
"xsi"
D.3 Initial Entries in Local-Name Partitions
The following tables list the entries that are initially populated in local-name partitions,
where XML-NS represents the partition for
local-names
in the
"http://www.w3.org/XML/1998/namespace"
namespace, XSI-NS
represents the partition for
local-names
in the
"http://www.w3.org/2001/XMLSchema-instance"
namespace, and XSD-NS
represents the partition for
local-names
in the
"http://www.w3.org/2001/XMLSchema"
namespace.
Table D-4. Initial
local-name
string table entries
Partition
String ID
String Value
XML-NS
"base"
XML-NS
"id"
XML-NS
"lang"
XML-NS
"space"
XSI-NS
"nil"
XSI-NS
"type"
When XML Schemas are used to inform the grammars for processing EXI body, there an additional partition that is appended to the local-name partitions.
Table D-5.
Additional
initial
local-name
string table entries
for schema-informed EXI streams
Partition
String ID
String Value
XSD-NS
"ENTITIES"
XSD-NS
"ENTITY"
XSD-NS
"ID"
XSD-NS
"IDREF"
XSD-NS
"IDREFS"
XSD-NS
"NCName"
XSD-NS
"NMTOKEN"
XSD-NS
"NMTOKENS"
XSD-NS
"NOTATION"
XSD-NS
"Name"
XSD-NS
10
"QName"
XSD-NS
11
"anySimpleType"
XSD-NS
12
"anyType"
XSD-NS
13
"anyURI"
XSD-NS
14
"base64Binary"
XSD-NS
15
"boolean"
XSD-NS
16
"byte"
XSD-NS
17
"date"
XSD-NS
18
"dateTime"
XSD-NS
19
"decimal"
XSD-NS
20
"double"
XSD-NS
21
"duration"
XSD-NS
22
"float"
XSD-NS
23
"gDay"
XSD-NS
24
"gMonth"
XSD-NS
25
"gMonthDay"
XSD-NS
26
"gYear"
XSD-NS
27
"gYearMonth"
XSD-NS
28
"hexBinary"
XSD-NS
29
"int"
XSD-NS
30
"integer"
XSD-NS
31
"language"
XSD-NS
32
"long"
XSD-NS
33
"negativeInteger"
XSD-NS
34
"nonNegativeInteger"
XSD-NS
35
"nonPositiveInteger"
XSD-NS
36
"normalizedString"
XSD-NS
37
"positiveInteger"
XSD-NS
38
"short"
XSD-NS
39
"string"
XSD-NS
40
"time"
XSD-NS
41
"token"
XSD-NS
42
"unsignedByte"
XSD-NS
43
"unsignedInt"
XSD-NS
44
"unsignedLong"
XSD-NS
45
"unsignedShort"
E Deriving
Set of Characters
from XML Schema Regular Expressions
XML Schema datatypes specification
[XML Schema Datatypes]
defines a
regular expression
XS2
syntax for use in pattern facets of simple type definitions. Pattern facets constrain the set of valid values to those that lexically match the specified regular expression. This section describes the rules for deriving the set of characters allowed in a string value that conforms to a given regular expression in an XML Schema.
In the following description, the term "set-of-chars" is used as the shorthand form of "set of characters".
At the top level,
the XML Schema regular expression
syntax is summarized by the following production excerpted here from
[XML Schema Datatypes]
. Note the notation used for the numbers that tag the productions. "XSD:" is prefixed to the original numeric tags to make it easier to discern them as belonging to XML Schema specification.
[XSD:1]
XS2
regExp  ::=  branch  (  '|'  branch  )*
The set-of-chars for a regex that conforms to the syntax above is the union of the set-of-chars defined for each branch. Each branch of a regex is described by the following production:
[XSD:2]
XS2
branch  ::=  piece*
The set-of-chars for each branch of a regex is the union of the set-of-chars for each piece of the branch. Each piece of a branch is described by the following production:
[XSD:3]
XS2
piece  ::=  atom  quantifier?
The set-of-chars for each piece of a branch is the set-of-chars for the atom portion of the piece. The atom portion of a piece is described by the following production:
[XSD:9]
XS2
atom  ::=  Char  |  charClass  |  (  '('  regExp  ')'  )
The set-of-chars for the atom is the set-of-chars for the Char, charClass or regExp that constitutes the atom.
The set-of-chars for a Char that constitutes an atom contains the single character that matches the
Char expression
XS2

The set-of-chars for a charClass that constitutes an atom is the set of characters specified by the
charClass expression
XS2

The set-of-chars for a regExp sub-expression enclosed in parenthesis that constitutes an atom is the set-of-chars for the regExp itself derived by recursively applying the rule defined above.
F Content Coding and Internet Media Type
Two labels are defined for identifying and negotiating the use of EXI for representing XML information in higher-level protocols. They serve two distinct roles. One is for content coding and the other is for internet media type.
F.1 Content Coding
The content-coding value "exi" is registered with the Internet Assigned Numbers Authority (IANA) for use with EXI. Protocols that can identify and negotiate the content coding of
XML
information independent of its media type,

SHOULD use the content coding "exi" (case-insensitive) to convey the acceptance or actual use of EXI encoding for XML information.
F.2 Internet Media Type
A new media type registration "application/exi" described below is being proposed for community review, with the intent to eventually submit it to the IESG for review, approval, and registration with IANA.
Type name:
application
Subtype name:
exi
Required parameters:
none
Optional parameters:
none
Encoding considerations:
binary
Security considerations:
When used as an XML replacement in an application, EXI shares
the same security concerns as XML, described in IETF RFC 3023
[IETF RFC 3023]
section 10.
In addition to concerns shared with XML, the schema identifier
refers to information external to the EXI document itself. If
an attacker is able to substitute another schema in place of
the intended one, the semantics of the EXI document could be
changed in some ways. As an example, EXI is sensitive to the
order of the values in an enumeration. It is not known whether
such an attack is possible on the actual structure of the
document.
Also, EXI supports user-defined datatype representations, and such
representations, if present in a document and purportedly understood by
a processor, can be a security weakness. Definitions of these
representations are expected to be external, often application- or
industry-specific, so any definition needs to be analyzed carefully from
the security perspective before being adopted.
Interoperability considerations:
The datatype representation map feature of EXI requires
coordination between the producer and consumer of an EXI
document, and is not recommended except in controlled
environments or using standardized datatype representations
potentially defined in the future.
EXI permits information necessary to decode a document to be
omitted with the expectation that such information has been
communicated out of band. Such omissions hinder
interoperability in uncontrolled environments.
Published specification:
Efficient XML Interchange (EXI) Format 1.0, World Wide Web
Consortium
Applications that use this media type:
No known applications currently use this media type.
Additional information:
Magic number(s):
The first four octets may be hexadecimal 24 45 58 49 ("$EXI").
The first octet after these, or the first octet of the whole
content if they are not present, has its high two bits set to
values 1 and 0 in that order.
File extension(s):
.exi
Macintosh file type code(s):
APPL
Consideration of alternatives :
When transferring EXI streams over a protocol that can identify and negotiate the content coding of XML information independent of its media-type, the content-coding should be used to identify and negotiate how the XML information is encoded and the media-type should be used to negotiate and identify what type of information is transfered.
Person & email address to contact for further information:
World Wide Web Consortium
Intended usage:
COMMON
Restrictions on usage:
none
Author/Change controller:
The EXI specification is the product of the World Wide Web
Consortium's Efficient XML Interchange Working Group. The W3C
has change control over this specification.
G Example Encoding (Non-Normative)
EXI Primer
[EXI Primer]
contains a section that explains the workings of EXI format using simple example documents. Those examples are intended to serve as a tool to confirm the understanding of the EXI format in action by going through encoding and decoding processes step by step.
H Schema-informed Grammar Examples (Non-Normative)
As an example to exercise the process to produce schema-informed element grammars, consider the following XML Schema fragment declaring two complex-typed elements, and :
Example H-1.
Example XML Schema fragment

Section
H.1 Proto-Grammar Examples
guides you through the process of
generating
EXI proto-grammars from the schema components available in the example schema above. EXI grammars in the normalized form that correspond to the proto-grammars are shown in section
H.2 Normalized Grammar Examples
. Section
H.3 Complete Grammar Examples
shows the complete EXI grammars for elements and .
H.1 Proto-Grammar Examples
Grammars for element declaration terms "description", "quantity" and "price" are as follows. See section
8.5.4.1.6 Element Terms
for the rules used to
generate grammars for element terms.
Term_description
Term_description
SE(
"description"
Term_description
Term_description
EE
Term_quantity
Term_quantity
SE(
"quantity"
Term_quantity
Term_quantity
EE
Term_price
Term_price
SE(
"price"
Term_price
Term_price
EE
The grammar for element particle "description" is

created based on
Term_description
given { minOccurs } value of 0 and { maxOccurs } value of 1. See section
8.5.4.1.5 Particles
for the rules used to
generate grammars for particles.
Particle_description
Term_description
SE(
"description"
Term_description
EE
Term_description
EE
Grammars for element particle "quantity" and "prices" are the same as those of their terms (
Term_quantity
and
Term_price
, respectively) because { minOccurs } and { maxOccurs } are both 1.
Particle_quantity
Term_quantity
SE(
"quantity"
Term_quantity
Term_quantity
EE
Particle_price
Term_price
SE(
"price"
Term_price
Term_price
EE
The grammar for the sequence group term in element declaration is
created based on
the grammars of subordinate particles as follows. See section
8.5.4.1.8.1 Sequence Model Groups
for the rules used to
generate grammars for sequence groups.
Term_sequence
Particle_description
Particle_quantity
Particle_price
which yields the following grammars for
Term_sequence
Term_sequence
Term_description
SE("description")
Term_description
Term_quantity
Term_description
Term_quantity
Term_quantity
SE("quantity")
Term_quantity
Term_quantity
Term_price
Term_price
SE("price")
Term_price
Term_price
EE
The grammar for the particle that is the content model of element
is created based on
Term_sequence
(shown above) given { minOccurs } value of 1 and { maxOccurs } value of 2. See section
8.5.4.1.5 Particles
for the rules used to
generate grammars for particles.
Particle_sequence
Term_description
0,0
SE("description")
Term_description
0,1
Term_quantity
0,0
Term_description
0,1
Term_quantity
0,0
Term_quantity
0,0
SE("quantity")
Term_quantity
0,1
Term_quantity
0,1
Term_price
0,0
Term_price
0,0
SE("price")
Term_price
0,1
Term_price
0,1
Term_description
1,0
Term_description
1,0
SE("description")
Term_description
1,1
Term_quantity
1,0
EE
Term_description
1,1
Term_quantity
1,0
Term_quantity
1,0
SE("quantity")
Term_quantity
1,1
Term_quantity
1,1
Term_price
1,0
Term_price
1,0
SE("price")
Term_price
1,1
Term_price
1,1
EE
Grammars for attribute uses of attributes "sku" and "color" are as follows. See section
8.5.4.1.4 Attribute Uses
for the rules used to
generate grammars for attribute uses.
Use_sku
Use_sku
AT("sku") [schema-typed value]
Use_sku
Use_sku
EE
Use_color
Use_color
AT("color") [schema-typed value]
Use_color
EE
Use_color
EE
Note the subtle difference between
the forms of the two
grammars
Use_sku
and
Use_color
At the outset of the grammars,
only
Use_color
contains a production of which the right-hand side starts with EE, which
is the result of
the difference in their occurrence
requirement
defined in the schema.
Finally, the grammar for the element is
created based on
the grammars of its attribute uses and content model particle as follows. See section
8.5.4.1.3.2 Complex Type Grammars
for the rules used to
generate grammars for complex types.
ProtoG_ProductElement
Use_color
Use_sku
Particle_sequence
which yields the following grammar for element .
ProtoG_ProductElement
Use_color
AT("color") [schema-typed value]
Use_color
Use_sku
Use_color
Use_sku
Use_sku
AT("sku") [schema-typed value]
Use_sku
Use_sku
Term_description
0,0
Term_description
0,0
SE("description")
Term_description
0,1
Term_quantity
0,0
Term_description
0,1
Term_quantity
0,0
Term_quantity
0,0
SE("quantity")
Term_quantity
0,1
Term_quantity
0,1
Term_price
0,0
Term_price
0,0
SE("price")
Term_price
0,1
Term_price
0,1
Term_description
1,0
Term_description
1,0
SE("description")
Term_description
1,1
Term_quantity
1,0
EE
Term_description
1,1
Term_quantity
1,0
Term_quantity
1,0
SE("quantity")
Term_quantity
1,1
Term_quantity
1,1
Term_price
1,0
Term_price
1,0
SE("price")
Term_price
1,1
Term_price
1,1
EE
The other element declaration can be processed
to generate its proto-grammar in a similar fashion as follows.
Term_product
Term_product
SE(
"product"
Term_product
Term_product
EE
The grammar for element particle "product" is created based on
Term_product
given { minOccurs } value of 1 and { maxOccurs } value of
unbounded
. See section
8.5.4.1.5 Particles
for the rules used to generate grammars for particles.
Particle_product
(before simplification)
Term_product
0,0
SE("product")
Term_product
0,1
Term_product
0,1
Term_product
1,0
Term_product
1,0
SE("product")
Term_product
1,1
EE
Term_product
1,1
Term_product
1,0
In the above grammars, two grammars
Term_product
0,1
and
Term_product
1,1
are redundant because they serve for no other purpose than simply relaying one non-terminal to another. Though it is not required, the uses of non-terminals
Term_product
0,1
and
Term_product
1,1
are each replaced by
Term_product
1,0
and
Term_product
1,0
, which produces the following
simplified
proto-grammars.
Particle_product
(after simplification)
Term_product
0,0
SE("product")
Term_product
1,0
Term_product
1,0
SE("product")
Term_product
1,0
EE
The proto-grammar of the element equates to
Particle_product
because the type definition of element has no attribute uses, and its content model has both { minOccurs } and { maxOccurs } property values of 1 where the element particle "product" is the sole member of the content model.
ProtoG_OrderElement
Term_product
0,0
SE("product")
Term_product
1,0
Term_product
1,0
SE("product")
Term_product
1,0
EE
H.2 Normalized Grammar Examples
The element proto-grammars
ProtoG_ProductElement
and
ProtoG_OrderElement
produced in the previous section can be turned into their normalized forms which are shown below with an
event code
assigned to each production. See section
8.5.4.2 EXI Normalized Grammars
for the process that converts proto-grammars into normalized grammars, and section
8.5.4.3 Event Code Assignment
for the rules that determine the
event codes
of productions in normalized grammars.
NormG_ProductElement
Event Code
Use_color
AT("color") [schema-typed value]
Use_color
AT("sku") [schema-typed value]
Use_sku
Use_color
AT("sku") [schema-typed value]
Use_sku
Use_sku
SE("description")
Term_description
0,1
SE("quantity")
Term_quantity
0,1
Term_description
0,1
SE("quantity")
Term_quantity
0,1
Term_quantity
0,1
SE("price")
Term_price
0,1
Term_price
0,1
SE("description")
Term_description
1,1
SE("quantity")
Term_quantity
1,1
EE
Term_description
1,1
SE("quantity")
Term_quantity
1,1
Term_quantity
1,1
SE("price")
Term_price
1,1
Term_price
1,1
EE
NormG_OrderElement
Event Code
Term_product
0,0
SE("product")
Term_product
1,0
Term_product
1,0
SE("product")
Term_product
1,0
EE
Note that some
productions
that were present in the proto-grammars have been removed in the normalized grammars.

Those
productions
were culled upon the completion of grammar normalization because their left-hand-side non-terminals are not referenced from right-hand side of any available productions,

and yet those non-terminals are not the first non-terminals of the grammar they belong to.
H.3 Complete Grammar Examples
The normalized grammars
NormG_ProductElement
and
NormG_OrderElement
are augmented with undeclared productions to become complete grammars.
See section
8.5.4.4 Undeclared Productions
for the process
to augment normalized grammars with productions for accepting terminal symbols not declared in schemas.
The complete grammars for elements and are shown below.
Note that the default grammar settings (i.e. the settings that can be described by an empty header options document is used for the sake of this augmentation process, and those productions that accept ER, NS, CM and PI have been pruned according to the rules described in section
8.3 Pruning Unneeded Productions
since those
terminal symbols
are not preserved in the default grammar settings.
Complete grammar for element
Event Code
Use_color
AT("color") [schema-typed value]
Use_color
AT("sku") [schema-typed value]
Use_sku
EE
2.0
AT(xsi:type)
Use_color
2.1
AT(xsi:nil)
Use_color
2.2
AT (*)
Use_color
2.3
AT("color") [untyped value]
Use_color
2.4.0
AT("sku") [untyped value]
Use_sku
2.4.1
AT (*) [untyped value]
Use_color
2.4.2
SE(*)
Use_sku
1_copied
2.5
CH [untyped value]
Use_sku
1_copied
2.6
Use_color
AT("sku") [schema-typed value]
Use_sku
EE
1.0
AT (*)
Use_color
1.1
AT("sku") [untyped value]
Use_sku
1.2.0
AT (*) [untyped value]
Use_color
1.2.1
SE(*)
Use_sku
1_copied
1.3
CH [untyped value]
Use_sku
1_copied
1.4
Use_sku
SE("description")
Term_description
0,1
SE("quantity")
Term_quantity
0,1
EE
2.0
AT (*)
Use_sku
2.1
AT (*) [untyped value]
Use_sku
2.2.0
SE(*)
Use_sku
1_copied
2.3
CH [untyped value]
Use_sku
1_copied
2.4
Use_sku
1_copied
SE("description")
Term_description
0,1
SE("quantity")
Term_quantity
0,1
EE
2.0
SE(*)
Use_sku
1_copied
2.1
CH [untyped value]
Use_sku
1_copied
2.2
Term_description
0,1
SE("quantity")
Term_quantity
0,1
EE
SE(*)
Term_description
0,1
2.0
CH [untyped value]
Term_description
0,1
2.1
Term_quantity
0,1
SE("price")
Term_price
0,1
EE
SE(*)
Term_quantity
0,1
2.0
CH [untyped value]
Term_quantity
0,1
2.1
Term_price
0,1
SE("description")
Term_description
1,1
SE("quantity")
Term_quantity
1,1
EE
SE(*)
Term_price
0,1
3.0
CH [untyped value]
Term_price
0,1
3.1
Term_description
1,1
SE("quantity")
Term_quantity
1,1
EE
SE(*)
Term_description
1,1
2.0
CH [untyped value]
Term_description
1,1
2.1
Term_quantity
1,1
SE("price")
Term_price
1,1
EE
SE(*)
Term_quantity
1,1
2.0
CH [untyped value]
Term_quantity
1,1
2.1
Term_price
1,1
EE
SE(*)
Term_price
1,1
1.0
CH [untyped value]
Term_price
1,1
1.1
Complete grammar for element
Event Code
Term_product
0,0
SE("product")
Term_product
1,0
EE
1.0
AT(xsi:type)
Term_product
0,0
1.1
AT(xsi:nil)
Term_product
0,0
1.2
AT (*)
Term_product
0,0
1.3
AT (*) [untyped value]
Term_product
0,0
1.4.0
SE(*)
Term_product
0,0_copied
1.5
CH [untyped value]
Term_product
0,0_copied
1.6
Term_product
0,0_copied
SE("product")
Term_product
1,0
EE
1.0
SE(*)
Term_product
0,0_copied
1.1
CH [untyped value]
Term_product
0,0_copied
1.2
Term_product
1,0
SE("product")
Term_product
1,0
EE
SE(*)
Term_product
1,0
2.0
CH [untyped value]
Term_product
1,0
2.1
I Recent Specification Changes (Non-Normative)
I.1 Changes from Last Call Working Draft
Defined schema-order used for sorting productions in schema-informed element and type grammars.
(see
8.5.4.3 Event Code Assignment
Added a caveat regarding the use of content-coding in EXI, clarifying that it is applicable only to XML documents and it is neither byte- nor character-preserving.
(see
F.1 Content Coding
Hour, Minute and Second values are comparted into separate sequence of bits in order to make it easier to retrieve each values from Date-Time representation.
(see
7.1.8 Date-Time
Made it clear that value content items of type xsd:QName and xsd:NOTATION are represented as String
with exception of xsi:type attribute whose representation uses xsd:QName.
(see
Table 7-1
Described the behaviour expected of an EXI processor when it encountered EXI streams that have EXI versions not implemented by the processor.
(see
5.3 EXI Format Version
AT (*) and/or SE (*) have been permitted on the right-hand side of Schema-informed Element Fragment Grammar.
(see
8.5.3 Schema-informed Element Fragment Grammar
A condition was specified that makes those elements of the same name be excluded from the application of Relaxed Element Fragment grammar. Likewise, a condition that excludes attributes of the same name from the alternative application of String datatype representation.
(see
8.5.2 Schema-informed Fragment Grammar
and
8.5.3 Schema-informed Element Fragment Grammar
Schema-informed Element Fragment Grammar is treated as though it was derived from an element declaration with a {nillable} property value of true and a type declaration that has named sub-types, when the grammar goes through the process of undeclared productions augmentation.
(see
8.5.3 Schema-informed Element Fragment Grammar
Added missing semantics for AT (*) productions in Complex Type Grammars and Complex Ur-Type Grammar.
(see
8.5.4.1.3.2 Complex Type Grammars
and
8.5.4.1.3.3 Complex Ur-Type Grammar
Added a note indicating xsi:type and xsi:nil attributes cannot be used together on the same element when strict is true.(see
8.5.4.4.2 Adding Productions when Strict is True
Clarified conditions under which xsi:nil values are stored in structure channel.
(see
9.2.1 Structure Channel
Added simple-type definitions for built-in EXI datatype representations to EXI options document schema.
(see
C XML Schema for EXI Options Document
Fixed issue in semantics that prohibited learning of AT(xsi:type) and AT(xsi:nil) in
built-in element grammar
(see
8.4.3 Built-in Element Grammar
In appendix section
E Deriving
Set of Characters
from XML Schema Regular Expressions
the terms "character set" and "charset" have been rephrased as "set of characters" and "set-of-chars", respectively. This is so as to avoid conjuring up some different connotations that these terms often bear in documents such as XML Schema specification.
It was made clear that "strict" element is not permitted to appear alongside "selfContained" element in an options document. (see the term definitions of "
strict option
" and "
selfContained option
")
xsi:type is now allowed on all union types when
strict option
is true. (see
8.5.4.4.2 Adding Productions when Strict is True
It has been made clearer that xsi:type and/or xsi:nil attribute events MUST occur before any other attribute events of the same element.(see
8.4.3 Built-in Element Grammar
8.5.4.1.3.2 Complex Type Grammars
8.5.4.4.1 Adding Productions when Strict is False
and
8.5.4.4.2 Adding Productions when Strict is True
The order of attribute occurrences when schema-informed grammar is in effect was relaxed to be in any order as long as the grammar permits it. (see
6. Encoding EXI Streams
The event after a self-contained element starts at an octet boundary. (see
8.4.3 Built-in Element Grammar
and see
8.5.4.4.1 Adding Productions when Strict is False
The provision for selfContained option was removed from section
8.5.4.4.2 Adding Productions when Strict is True
This removal does not cause any change to the format, because selfContained and strict options are mutually exclusive, therefore the provision there could never be utilized thus was was simply extraneous.
SE(*) semantics has been clarified with regards to how grammars are resolved given an element
qname
The initial entries of local-name string table partitions has been split into two, where the second contains entries for the XSD-NS partition and gets appended to first only when XML Schemas are used. (see
D.3 Initial Entries in Local-Name Partitions
It is now explicitly stated that Preserve.prefixes fidelity option SHOULD be turned on when qualified names are used in value content items. (see
6.3 Fidelity Options
Defined the term "
global element grammars
" for use in describing the SE (*) semantics.
Fixed various bugs that were present in the examples described in
H Schema-informed Grammar Examples
Sorted the initial entries in local-name partition of string tables were sorted in lexicographical order.
(see
D.3 Initial Entries in Local-Name Partitions
Simplified the way in which restricted character sets are derived from datatypes with pattern facets.
(see
7.1.10.1 Restricted Character Sets
and
E Deriving
Set of Characters
from XML Schema Regular Expressions
Added references to XML 1.1 alongside those of XML 1.0 to consistently indicate that EXI can be used with both versions of XML.
(see
3. Basic Concepts
and
5.2 Distinguishing Bits
Round-robin style mechanism of entry reuse for
value
string table partitions is described.
(see
7.3.3 Partitions Optimized for Frequent use of String Literals
Added exi:time, exi:date, exi:gYearMonth, exi:gYear, exi:gMonthDay, exi:gDay and exi:gMonth to
Table 7-2
Appended an empty content grammar to the definition of
TypeEmpty
in "8.5.4.1.3.2 Complex Type Grammars".
(see
8.5.4.1.3.2 Complex Type Grammars
Changed the threshold number used for determining the use of restricted character sets from 255 to 256.
(see
7.1.10.1 Restricted Character Sets
Implementations are allowed to choose whether to use local or global table hit when both are available
for representing string values.
(see
7.3.3 Partitions Optimized for Frequent use of String Literals
Pattern facets defined for built-in schema datatypes are disregarded when computing restricted character sets.
(see
7.1.10.1 Restricted Character Sets
The prefix string table partitions are made use of in the encoding of the prefix component of QName.
(see
7.1.7 QName
Changed the default value of schemaID, datatypeRepresentationMap and [user defined meta-data] options to "no default value" in
Table 5-1
It is made clear that [user defined meta-data] option cannot modify the EXI format.
(see
5.4 EXI Options
The predicates used for AT and CH terminal symbols (i.e. those previously denoted as [schema-valid value] and [schema-invalid value]) have been changed to [schema-typed value] and [untyped value] to more accurately reflect the associated semantics.
Updated the section on content coding to reflect that the "exi" content coding is now in the IANA registry.
(see
F.1 Content Coding
Added a non-terminal
TypeEmpty
ur-type,
to the complex ur-type grammar
TypeEmpty
ur-type
, and defined the content index of the
TypeEmpty
ur-type
grammar.
(see
8.5.4.1.3.3 Complex Ur-Type Grammar
Removed the paragraph that described the processor conformance defining the minimal value range required of blockSize option support.
(see
10.2 EXI Processor Conformance
It was made clear that the datatypeRepresentationMap option does not take effect when the value of the preserve.lexicalValues fidelity option is true.
(See the definition of datatypeRepresentationMap in
5.4 EXI Options
It was made clear that the section
7.1.10.1 Restricted Character Sets
only applies to datatypes
derived from xsd:string.
The maximum size of the range used for determining the use of
-bit Unsigned Integer for representing values of xsd:integer was changed from 4095 to 4096.
(see
Table 7-1
and
7.1.9 n-bit Unsigned Integer
).
Corrected the semantics associated with the production of the form
LeftHandSide
: EE in section
8.4.3 Built-in Element Grammar
Added
ElementFragmentTypeEmpty
grammar to the schema-informed element fragment grammar, and defined the
content
index of grammars
ElementFragment
and
ElementFragmentTypeEmpty
(see
8.5.3 Schema-informed Element Fragment Grammar
The target namespace name of the XML Schema for EXI options document was changed to "http://www.w3.org/2009/exi".
(see
C XML Schema for EXI Options Document
Updated the semantics associated with xsi:type and xsi:nil attributes to ensure they are encoded property when the value of preserve.lexicalValues option is true.
Removed the reference to Unicode 5.0.0 character database. Section
E Deriving
Set of Characters
from XML Schema Regular Expressions
no longer has dependency on Unicode 5.0.0. A normative reference to XML Schema datatypes specification
[XML Schema Datatypes]
in regards to processing regular expressions effectively have
character class escapes
XS2
be each resolved into a set of characters using the exact version of XML and Unicode, namely XML 1.0 (Second Edition) and Unicode 3.1.0.
Added the "processContents" attribute with value "skip" to the wildcard terms in the XML Schema for EXI options document.
(see
C XML Schema for EXI Options Document
Added a statement that suggests to turn on Preserve.prefixes fidelity option when the Preserve.lexicalValues fidelity option is true and xsi:type attributes are used in the EXI stream.
(see
6.3 Fidelity Options
The EXI datatype identifier exi:integer represents the Integer datatype representation, and
Table 7-1
has been updated to accurately reflect the association. The rule previously applied to xsd:Integer datatype in that table is now described in section
7.1.5 Integer
Two qname definitions "ieeeBinary32" and "ieeeBinary64" have been given in the EXI options document schema.
(see
C XML Schema for EXI Options Document
Implentations are required to support all Unsigned Integer values less than 2147483648, whereas the threshold value was previously 4294967296.
(see
7.1.6 Unsigned Integer
I.2 Changes from Fourth Public Working Draft
Added the section
F Content Coding and Internet Media Type
I.3
Changes from Third Public Working Draft
Added the section
10. Conformance
EXI Cookie
was introduced in the EXI header in order to provide a facility to distinguish EXI streams from a broader range of document types used on the Web. (see
5.1 EXI Cookie
Header options fields
valueMaxLength
and
valuePartitionCapacity
have beed added in EXI options to limit the length of a string and the total number of strings that are put into value partitions of a string table.
Added support for mixed and empty content model in complex type grammar generation.
(see
8.5.4.1.3.2 Complex Type Grammars
Described how
built-in element grammars
handle xsi:type and xsi:nil attribute occurrences in schema-informed EXI streams.
(see
8.4.3 Built-in Element Grammar
Defined the grammar that represents xsd:anyType.
(see
8.5.4.1.3.3 Complex Ur-Type Grammar
Described the restricted character sets used for certain typed values when
preserve.lexicalValues
is true.
(see
Table 7-2
Added a section
8.5.3 Schema-informed Element Fragment Grammar
describing the content of elements declared with the same qname when they occur inside an EXI fragment or EXI Element Fragment.
Added
selfContained
option for creating elements that can be indexed for random access.
(see
5.4 EXI Options
Added mention of Unsigned Integer size that EXI processors SHOULD and MUST support.
(see
7.1.6 Unsigned Integer
Changed the order of productions representing CH events
during event code assignment.
(see
8.5.4.3 Event Code Assignment
Added semantics for empty "schemaID" element value.
(see
5.4 EXI Options
The term "CODEC" has been replaced by a more appropriate term "datatype representation" throughout this document.
Added support for parameterized attribute wildcards in the generation of complex type grammars.
(see
8.5.4.1.3.2 Complex Type Grammars
I.4
Changes from Second Public Working Draft
Added
strict option
(see
5.4 EXI Options
and
8.5.4.4.2 Adding Productions when Strict is True
).
The order of content items for NS event has been corrected. (see
Table 4-1
Improved the description of
EXI Options document
to make it clear that its representation does not start with an EXI header (see section
5.4 EXI Options
).
Reworked the section
8.5.4.4 Undeclared Productions
for grammar system accuracy as well as describing how grammars are augumented with undeclared productions when
strict option
is true.
The indicator content item in NS event type has been renamed to"local-element-ns", and improved its description for clarity (see section
4. EXI Streams
).
The calculus of the MonthDay component used in Date-Time representation has been corrected (see section
7.1.8 Date-Time
).
TypeEmpty
is created for each type grammars to facilitate xsi:nil handling (see section
8.5.4.1.3 Type Grammars
).
Described how the use of substitution groups in schemas affects the grammars (see section
8.5.4.1.6 Element Terms
).
Described how the namespace constraints of wildcard terms are factored into the grammars. (see section
8.5.4.1.7 Wildcard Terms
).
Regular expressions do not apply to constrain character sets when regexps contain certain category escapes (see section
E Deriving
Set of Characters
from XML Schema Regular Expressions
).
I.5
Changes from First Public Working Draft
Specified how schema-informed grammars are derived from available XML Schemas (see section
8.5.4 Schema-informed Element and Type Grammars
).
Described how QNames are represented with prefixes when prefix preservation is turned on (see section
7.1.7 QName
).
The
alignment
option was introduced in
EXI Options
to support
pre-compression
stream as well as plain
byte-alignment
stream, in addition to the default
bit-packed
representation.
Described how the presence of pattern facets affects the encoding of EXI Boolean (see section
7.1.2 Boolean
) and EXI String (see section
7.1.10 String
) representation.
Values typed as integer with bounded range of 4095 or smaller are now represented as
-bit Unsigned Integers (see section
7.1.9 n-bit Unsigned Integer
).
Added a section that lists additional items that have been advised and are currently under consideration.
J Acknowledgements (Non-Normative)
This document is the work of the
Efficient XML Interchange (EXI) WG
Members of the Working Group are (at the time of writing, sorted alphabetically by last name):
Carine Bournez, W3C/ERCIM (
staff contact
Don Brutzman, Web3D Consortium
Michael Cokus, MITRE Corporation
co-chair
Youenn Fablet, Canon, Inc.
Jun Fujisawa, Canon, Inc.
Joerg Heuer, Siemens AG
Alan Hudson, Web3D Consortium
Takuki Kamiya, Fujitsu Laboratories of America, Inc.
co-chair
Jaakko Kangasharju, University of Helsinki
Richard Kuntschke, Siemens AG
Don McGregor, Web3D Consortium
Daniel Peintner, Siemens AG
Santiago Pericas-Geertsen, Sun Microsystems, Inc.
Liam Quin, W3C/MIT (
staff contact
Rich Rollman, AgileDelta, Inc.
Paul Sandoz, Sun Microsystems, Inc.
John Schneider, AgileDelta, Inc.
Sheldon Snyder, Web3D Consortium
Greg White, Stanford University (
former co-chair
The EXI Working Group would like to acknowledge the following former members of the group for their leadership, guidance and expertise they provided throughout their individual tenure in the WG. (sorted alphabetically by last name)
Robin Berjon, Expway (
former co-chair
) (until 17 October 2006)
Oliver Goldman, Adobe Systems, Inc. (
former co-chair
) (until 08 June 2006)
Peter Haggar, IBM (until 07 March 2007)
Kimmo Raatikainen, Nokia (until 13 March 2008)
Paul Thorpe, OSS Nokalva, Inc. (until 11 Sept 2007)
Daniel Vogelheim, Invited Expert (
former co-chair
then from Siemens AG) (until 15 July 2008)
Stephen Williams, High Performance Technologies, Inc. (until 8 Aug 2008)
Ed Day, Objective Systems, Inc. (until 23 Oct 2009)
The EXI working group owes so much to our distinguished colleague from Nokia, Kimmo Raatikainen (1955-2008), on the progress of our work, who succumbed to an ailment on March 13, 2008. His breadth of knowledge, depth of insight, ingenuity and courage to speak up constantly shed a light onto us whenever the group seemed to stray into a futile path of disagreements during the course. We shall never forget and will always appreciate his presence in us, and great contribution that is omnipresent in every aspect of our work throughout.