XProc: An XML Pipeline Language
XProc: An XML Pipeline Language
W3C Recommendation 11 May 2010
This Version:
Latest Version:
Previous versions:
Editors:
Norman Walsh
MarkLogic Corporation
norman.walsh@marklogic.com
Alex Milowski
Invited expert
alex@milowski.org
Henry S. Thompson
University of Edinburgh
ht@inf.ed.ac.uk
Please refer to the
errata
for this document, which may include some normative
corrections.
See also
translations
This document is also available in these non-normative formats:
XML
W3C
MIT
ERCIM
Keio
), All Rights Reserved.
W3C
liability
trademark
and
document
use
rules apply.
Abstract
This specification describes the syntax and semantics of
XProc: An XML Pipeline Language
, a
language for describing operations to be performed on XML
documents.
An XML Pipeline specifies a sequence of operations to be
performed on zero or more XML documents. Pipelines generally accept
zero or more XML documents as input and produce zero or more XML
documents as output. Pipelines are made up of simple steps which
perform atomic operations on XML documents and constructs similar
to conditionals, iteration, and exception handlers which control
which steps are executed.
Status of this Document
This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. A list of current W3C publications and the latest
revision of this technical report can be found in the
W3C technical reports index
at
This document is a
W3C
Recommendation
. It implements the requirements and use cases
documented in [
XProc
Requirements
]. This document is a product of the
XML Processing Model
Working Group
as part of the W3C
XML Activity
This document has been reviewed by W3C Members, by software
developers, and by other W3C groups and interested parties, and is
endorsed by the Director as a W3C Recommendation. It is a stable
document and may be used as reference material or cited from
another document. W3C's role in making the Recommendation is to
draw attention to the specification and to promote its widespread
deployment. This enhances the functionality and interoperability of
the Web.
Please report errors in this document to the public mailing list
public-xml-processing-model-comments@w3.org
(public
archives
are available).
There is an
Implementation Report
for XProc
. It documents the performance of implementations
against the
XProc Test
Suite
This document was produced by a group operating under the
February 2004 W3C Patent Policy
. W3C maintains a
public list of any
patent disclosures
made in connection with the deliverables of
the group; that page also includes instructions for disclosing a
patent. An individual who has actual knowledge of a patent which
the individual believes contains
Essential Claim(s)
must disclose the information in accordance
with
section 6 of the W3C Patent Policy
Table of Contents
Introduction
Pipeline
Concepts
2.1
Steps
2.1.1
Step
names
2.2
Inputs and
Outputs
2.2.1
External
Documents
2.2.2
Non-XML
Documents
2.3
Primary Inputs
and Outputs
2.4
Connections
2.4.1
Namespace Fixup on
Outputs
2.5
Environment
2.6
XPaths in
XProc
2.6.1
XPath 1.0
processors
2.6.2
XPath 2.0
processors
2.7
XPath
Extension Functions
2.7.1
System
Properties
2.7.2
Step
Available
2.7.3
Value
Available
2.7.4
Iteration
Position
2.7.5
Iteration
Size
2.7.6
Base URI
2.7.7
Resolve
URI
2.7.8
Version
Available
2.7.9
XPath
Version Available
2.7.10
Other XPath Extension
Functions
2.8
PSVIs in
XProc
2.9
Variables
2.10
Options
2.11
Parameters
2.12
Security
Considerations
2.13
Versioning
Considerations
2.13.1
Backwards-compatible
Mode
2.13.2
Forwards-compatible
Mode
Syntax Overview
3.1
XProc
Namespaces
3.2
Scoping of
Names
3.3
Base URIs and
xml:base
3.4
Unique
identifiers
3.5
Associating
Documents with Ports
3.6
Documentation
3.7
Processor
annotations
3.8
Extension
attributes
3.9
Conditional Element
Exclusion
3.10
Syntax
Summaries
3.11
Common
errors
Steps
4.1
p:pipeline
4.2
p:for-each
4.2.1
XPath
Context
4.3
p:viewport
4.3.1
XPath
Context
4.4
p:choose
4.4.1
p:xpath-context
4.4.2
p:when
4.4.3
p:otherwise
4.5
p:group
4.6
p:try
4.6.1
The Error
Vocabulary
4.7
Atomic Steps
4.8
Extension
Steps
4.8.1
Syntactic Shortcut
for Option Values
Other pipeline
elements
5.1
p:input
5.1.1
Document
Inputs
5.1.2
Parameter
Inputs
5.2
p:iteration-source
5.3
p:viewport-source
5.4
p:output
5.5
p:log
5.6
p:serialization
5.7
Variables, Options, and
Parameters
5.7.1
p:variable
5.7.2
p:option
5.7.3
p:with-option
5.7.4
p:with-param
5.7.5
Namespaces on
variables, options, and parameters
5.8
p:declare-step
5.8.1
Declaring
atomic steps
5.8.2
Declaring
pipelines
5.9
p:library
5.10
p:import
5.11
p:pipe
5.12
p:inline
5.13
p:document
5.14
p:data
5.15
p:empty
5.16
p:documentation
5.17
p:pipeinfo
Errors
6.1
Static
Errors
6.2
Dynamic
Errors
6.3
Step
Errors
Standard Step
Library
7.1
Required
Steps
7.1.1
p:add-attribute
7.1.2
p:add-xml-base
7.1.3
p:compare
7.1.4
p:count
7.1.5
p:delete
7.1.6
p:directory-list
7.1.7
p:error
7.1.8
p:escape-markup
7.1.9
p:filter
7.1.10
p:http-request
7.1.11
p:identity
7.1.12
p:insert
7.1.13
p:label-elements
7.1.14
p:load
7.1.15
p:make-absolute-uris
7.1.16
p:namespace-rename
7.1.17
p:pack
7.1.18
p:parameters
7.1.19
p:rename
7.1.20
p:replace
7.1.21
p:set-attributes
7.1.22
p:sink
7.1.23
p:split-sequence
7.1.24
p:store
7.1.25
p:string-replace
7.1.26
p:unescape-markup
7.1.27
p:unwrap
7.1.28
p:wrap
7.1.29
p:wrap-sequence
7.1.30
p:xinclude
7.1.31
p:xslt
7.2
Optional
Steps
7.2.1
p:exec
7.2.2
p:hash
7.2.3
p:uuid
7.2.4
p:validate-with-relax-ng
7.2.5
p:validate-with-schematron
7.2.6
p:validate-with-xml-schema
7.2.7
p:www-form-urldecode
7.2.8
p:www-form-urlencode
7.2.9
p:xquery
7.2.10
p:xsl-formatter
7.3
Serialization
Options
Appendices
Conformance
A.1
Implementation-defined
features
A.2
Implementation-dependent
features
A.3
Infoset
Conformance
References
B.1
Normative
References
B.2
Informative
References
Glossary
Pipeline Language
Summary
List of Error
Codes
E.1
Static
Errors
E.2
Dynamic
Errors
E.3
Step
Errors
Guidance on
Namespace Fixup (Non-Normative)
Handling Circular and
Re-entrant Library Imports (Non-Normative)
Sequential steps,
parallelism, and side-effects
The
application/xproc+xml
media type
I.1
Registration
of MIME media type application/xproc+xml
I.2
Fragment
Identifiers
1 Introduction
An XML Pipeline specifies a sequence of operations to be
performed on a collection of XML input documents. Pipelines take
zero or more XML documents as their input and produce zero or more
XML documents as their output.
pipeline
consists of steps. Like pipelines, steps take zero or more XML
documents as their inputs and produce zero or more XML documents as
their outputs. The inputs of a step come from the web, from the
pipeline document, from the inputs to the pipeline itself, or from
the outputs of other steps in the pipeline. The outputs from a step
are consumed by other steps, are outputs of the pipeline as a
whole, or are discarded.
There are three kinds of steps: atomic steps, compound steps,
and multi-container steps. Atomic steps carry out single operations
and have no substructure as far as the pipeline is concerned.
Compound steps and multi-container steps control the execution of
other steps, which they include in the form of one or more
subpipelines.
This specification defines a standard library,
Section 7,
“Standard Step Library”
, of steps. Pipeline implementations
may
support additional types of steps
as well.
Figure 1, “A
simple, linear XInclude/Validate pipeline”
is a graphical
representation of a simple pipeline that performs XInclude
processing and validation on a document.
Figure 1. A simple, linear
XInclude/Validate pipeline
This is a pipeline that consists of two atomic steps, XInclude
and Validate with XML Schema. The pipeline itself has two inputs,
“source” (a source document) and “schemas” (a sequence of W3C XML
Schemas). The XInclude step reads the pipeline input “source” and
produces a result document. The Validate with XML Schema step reads
the pipeline input “schemas” and the result of the XInclude step
and produces its own result document. The result of the validation,
“result”, is the result of the pipeline. (For consistency across
the step vocabulary, the standard input is usually named “source”
and and the standard output is usually named “result”.)
The pipeline document determines how the steps are connected
together inside the pipeline, that is, how the output of one step
becomes the input of another.
The pipeline document for this pipeline is shown in
Example 1, “A
simple, linear XInclude/Validate pipeline”
Example 1. A simple, linear
XInclude/Validate pipeline
version="1.0">
The example in
Example 1, “A
simple, linear XInclude/Validate pipeline”
is very verbose. It
makes all of the connections seen in the figure explicit. In
practice, pipelines do not have to be this verbose. XProc supports
defaults for many common cases:
If you use
p:pipeline
instead of
p:declare-step
the “
source
” input port and “
result
” output port are implicitly declared for
you.
Where inputs and outputs are connected between sequential
sibling steps, they do not have to be made explicit.
The same pipeline, using XProc defaults, is shown in
Example 2,
“A simple, linear XInclude/Validate pipeline (simplified)”
Example 2. A simple, linear
XInclude/Validate pipeline (simplified)
version="1.0">
Figure 2, “A validate and
transform pipeline”
is a more complex example: it performs
schema validation with an appropriate schema and then styles the
validated document.
Figure 2. A validate and transform
pipeline
The heart of this example is the conditional. The “choose” step
evaluates an XPath expression over a test document. Based on the
result of that expression, one or another branch is run. In this
example, each branch consists of a single validate step.
Example 3. A validate and transform
pipeline
This example, like the preceding, relies on XProc defaults for
simplicity. It is always valid to write the fully explicit form if
you prefer.
The media type for pipeline documents is
application/xml
. Often, pipeline documents are
identified by the extension
.xpl
In this specification the words
must
must not
should
should
not
may
and
recommended
are to be interpreted as described in
RFC 2119
].
2 Pipeline Concepts
[Definition: A
pipeline
is a set of connected steps, with outputs
of one step flowing into inputs of another.]
A pipeline is
itself a
step
and
must satisfy the constraints on steps. Connections between steps
occur where the input of one step is connected to the output of
another.
The result of evaluating a pipeline (or
subpipeline
) is the
result of evaluating the steps that it contains, in an order
consistent with the connections between them. A pipeline must
behave as if it evaluated each step each time it is encountered.
Unless otherwise indicated, implementations
must not
assume that steps are functional (that
is, that their outputs depend only on their
inputs
options
, and
parameters
) or side-effect free.
The pattern of connections between steps will not always
completely determine their order of evaluation.
The evaluation order of steps not connected to one
another is
implementation-dependent
2.1 Steps
[Definition: A
step
is the basic computational unit of a
pipeline.]
A typical step has zero or more inputs, from
which it receives XML documents to process, zero or more outputs,
to which it sends XML document results, and can have options and/or
parameters.
There are three kinds of steps:
atomic
compound
, and
multi-container
[Definition: An
atomic step
is a step that performs a
unit of XML processing, such as XInclude or transformation, and has
no internal
subpipeline
. ]
Atomic steps carry
out fundamental XML operations and can perform arbitrary amounts of
computation, but they are indivisible. An XSLT step, for example,
performs XSLT processing; a Validate with XML Schema step validates
one input with respect to some set of XML Schemas, etc.
There are many
types
of atomic steps. The standard
library of atomic steps is described in
Section 7, “Standard Step
Library”
, but implementations
may
provide others as well.
It is
implementation-defined
what
additional step types, if any, are provided. Each use, or instance,
of an atomic step invokes the processing defined by that type of
step. A pipeline may contain instances of many types of steps and
many instances of the same type of step.
Compound steps, on the other hand, control and organize the flow
of documents through a pipeline, reconstructing familiar
programming language functionality such as conditionals, iterators
and exception handling. They contain other steps, whose evaluation
they control.
[Definition: A
compound step
is a step that contains a
subpipeline
.]
That is, a compound
step differs from an atomic step in that its semantics are at least
partially determined by the steps that it contains.
Finally, there are two “multi-container steps”:
p:choose
and
p:try
[Definition: A
multi-container step
is a step that contains
several alternate
subpipelines
. ]
Each subpipeline
is identified by a non-step wrapper element:
p:when
and
p:otherwise
in the
case of
p:choose
p:group
and
p:catch
in the case of
p:try
The output of a multi-container step is the output of exactly
one of its subpipelines. In this sense, a multi-container step
functions like a
compound step
. However, evaluating a
multi-container step may involve evaluating, or partially
evaluating, more than one of its subpipelines. It's possible for
steps in a partially evaluated pipeline to have side effects that
are visible outside the processor, even if the final output of the
multi-container step is the result of some other subpipeline. For
example, a web server might record that some interaction was
performed, or a file on the local file system might have been
modified.
[Definition: A compound
step or multi-container step is a
container
for the steps directly within it or
within non-step wrappers directly within it.]
[Definition: The steps that occur
directly within, or within non-step wrappers directly within, a
step are called that step's
contained
steps
. In other words, “container” and “contained steps” are
inverse relationships.]
[Definition: The
ancestors
of a step, if it has any, are its
container
and the ancestors of its container.]
[Definition: Sibling
steps (and the connections between them) form a
subpipeline
.]
[Definition: The
last
step
in a subpipeline is its last step in document
order.]
subpipeline
p:variable
*, (
p:for-each
p:viewport
p:choose
p:group
p:try
p:
standard-step
pfx:user-pipeline
)+
Note
User-defined pipelines (identified with
pfx:user-pipeline
in the preceding
syntax summary) are atomic. A pipeline
declaration
may
contain a subpipeline, but the invocation of that pipeline is
atomic and does not contain a subpipeline.
Steps have “ports” into which inputs and outputs are connected.
Each step has a number of input ports and a number of output ports;
a step can have zero input ports and/or zero output ports. (All
steps have an implicit output port for reporting errors that
must not
be declared.) The names of
all ports on each step must be unique on that step (you can't have
two input ports named “source”, nor can you have an input port
named “schema” and an output port named “schema”).
A Step may have zero or more
options
, all
with unique names.
Steps may have parameter input ports, on which
parameters
can be passed. A step can have zero,
one, or many parameter input ports, and each parameter input port
can have zero or more parameters passed on it. If more than one
parameter with the same name is passed to any given parameter input
port, only the last value specified will be available to the step;
the names of the parameters passed to the step are unique on each
port. Parameters with the same name can be passed to different
ports, the uniqueness constraint on names only applies to the
parameters passed on each individual port.
All of the different instances of steps (atomic or compound) in
a pipeline can be distinguished from one another by name. If the
pipeline author does not provide a name for a step, a default name
is
manufactured automatically
2.1.1 Step
names
The
name
attribute on any step
can be used to give it a name. The name must be unique within its
scope, see
Section 3.2, “Scoping of Names”
If the pipeline author does not provide an explicit name, the
processor manufactures a default name. All default names are of the
form “
!1
.m
.n
…” where “
” is the position
(in the sense of counting sibling elements) of the step's highest
ancestor element within the pipeline document or library which
contains it, “
” is the position of the next-highest
ancestor, and so on, including both steps and non-step wrappers.
For example, consider the pipeline in
Example 3, “A validate and
transform pipeline”
. The
p:pipeline
step has no name, so it gets the
default name “
!1
”; the
p:choose
gets the name
!1.1
”; the first
p:when
gets the name
!1.1.1
”; the
p:otherwise
gets
the name “
!1.1.2
”, etc. If the
p:choose
had had a
name, it would not have received a default name, but it would still
have been counted and its first
p:when
would still have been “
!1.1.1
”.
Providing every step in the pipeline with an interoperable name
has several benefits:
It allows implementors to refer to all steps in an interoperable
fashion, for example, in error messages.
Pragmatically, we say that
readable ports
are identified by a
step name/port name pair. By manufacturing names for otherwise
anonymous steps, we include implicit connections without changing
our model.
In a valid pipeline that runs successfully to completion, the
manufactured names aren't visible (except perhaps in debugging or
logging output).
Note
The format for defaulted names does not conform to the
requirements of an
NCName
. This is an
explicit design decision; it prevents pipelines from using the
defaulted names on
p:pipe
elements. If an explicit connection
is required, the pipeline author must provide an explicit name for
the step.
2.2 Inputs
and Outputs
Although some steps can read and write non-XML resources, what
flows
between
steps through input ports and output ports
are exclusively XML documents or sequences of XML documents.
For the purposes of this specification, an XML document is an
Infoset
]. Implementations are free to transmit
Infosets as sequences of characters, sequences of events, object
models, or any other representation that preserves the necessary
Infoset properties (see
Section A.3, “Infoset
Conformance”
).
Most steps in this specification manipulate XML documents, or
portions of XML documents. In these cases, we speak of changing
elements, attributes, or nodes without prejudice to the actual
representation used by an implementation.
An implementation
may
make it
possible for a step to produce non-XML output (through channels
other than a named output port)—for example, writing a PDF document
to a URI—but that output cannot flow through the pipeline.
Similarly, one can imagine a step that takes no pipeline inputs,
reads a non-XML file from a URI, and produces an XML output. But
the non-XML data cannot arrive on an input port to a step.
It is a
dynamic
error
err:XD0001
) if a non-XML resource is produced
on a step output or arrives on a step input.
The common case is that each step has one or more inputs and one
or more outputs.
Figure 3, “An atomic step”
illustrates
symbolically an
atomic step
with two inputs and one
output.
Figure 3. An atomic step
All atomic steps are defined by a
p:declare-step
The declaration of an atomic step type defines the input ports,
output ports, and options of all steps of that type. For example,
every
p:validate-with-xml-schema
step has two
inputs, named “
source
” and “
schema
”, one output named “
result
”, and the same set of options.
Like atomic steps, top level, user-defined pipelines also have
declarations. The situation is slightly more complicated for the
other compound steps because they don't have separate declarations;
each instance of the compound step serves as its own declaration.
On these compound steps, the number and names of the outputs can be
different on each instance of the step.
Figure 4, “A compound step”
illustrates
symbolically a compound step with one subpipeline and one output.
As you can see from the diagram, the output from the compound step
comes from one of the outputs of the subpipeline within the
step.
Figure 4. A compound step
[Definition: The
input ports declared on a step are its
declared inputs
.]
[Definition: The output ports declared on a
step are its
declared outputs
.]
When a step is used in a pipeline, it is connected to other steps
through its inputs and outputs.
When a step is used, all of the
declared inputs
of the step
must
be connected. Each input can be
connected to:
The output port of some other step.
A fixed, inline document or sequence of documents.
A document read from a URI.
One of the inputs declared on one of its
ancestors
A special port provided by an ancestor compound step, for
example, “
current
” in a
p:for-each
or
p:viewport
When an input accepts a sequence of documents, the documents can
come from any combination of these locations.
It is a
static
error
err:XS0003
) if any declared input is not
connected.
The
declared outputs
of a step may be
connected to:
The input port of some other step (including
p:iteration-source
or
p:viewport-source
).
The
p:xpath-context
of a
p:choose
or
p:when
An option assigned with
p:with-option
or a parameter assigned with
p:with-param
One of the outputs declared on its container.
The
primary output port
of a step
must
be connected.
It is a
static
error
err:XS0005
) if the primary output port of any
step is not connected. Other outputs can remain unconnected. Any
documents produced on an unconnected output port are discarded.
Primary input and primary output ports may be implicitly
connected if no explicit connection is given, see
Section 2.3, “Primary Inputs and
Outputs”
Output ports on compound steps have a dual nature: from the
perspective of the compound step's siblings, its outputs are just
ordinary outputs and must be connected as described above. From the
perspective of the subpipeline inside the compound step, they are
inputs into which something may be connected.
Within a compound step, the
declared outputs
of the step can be
connected to:
The output port of some
contained step
A fixed, inline document or sequence of documents.
A document read from a URI.
If a (non-primary) output port of a compound step is left
unconnected, it produces an empty sequence of documents from the
perspective of its siblings.
Each input and output on a step is declared to accept or produce
either a single document or a sequence of documents. It
is
not
an error to connect a port that is declared to produce a
sequence of documents to a port that is declared to accept only a
single document. It is, however, an error if the former step
actually produces more than one document at run time.
It is also not an error to connect a port that is declared to
produce a single document to a port that is declared to accept a
sequence. A single document is the same as a sequence of one
document.
An output port may have more than one connection: it may be
connected to more than one input port, more than one of its
container's output ports, or both. At runtime this will result in
distinct copies of the output.
[Definition: The
signature
of a step is the set of
inputs, outputs, and options that it is declared to accept.]
The declaration for a step provides a fixed signature which all its
instances share.
[Definition: A step
matches
its signature if and only if it
specifies an input for each declared input, it specifies no inputs
that are not declared, it specifies an option for each option that
is declared to be required, and it specifies no options that are
not declared.]
In other words, every input and required
option
must
be specified and only
inputs and options that are declared
may
be specified. Options that aren't required do
not have to be specified.
Steps
may
also produce error,
warning, and informative messages. These messages are captured and
provided on the
error
port inside of a
p:catch
Outside of a
try/catch
, the disposition of error messages is
implementation-dependent
How inputs are connected to XML
documents outside the pipeline is
implementation-defined
. In
order to be consistent with the XPath data model, all general and
external parsed entities in such documents
must
be fully expanded; they
must not
contain any representation of [
Infoset
[unexpanded entity reference information
items]
How pipeline outputs are
connected to XML documents outside the pipeline is
implementation-defined
2.2.1 External Documents
It's common for some of the documents used in processing a
pipeline to be read from URIs. Sometimes this occurs directly, for
example with a
p:document
element. Sometimes it occurs
indirectly, for example if an implementation allows the URI of a
pipeline input to be specified on the command line or if an
p:xslt
step
encounters an
xsl:import
in the
stylesheet that it is processing. It's also common for some of the
documents produced in processing a pipeline to be written to
locations which have, or at least could have, a URI.
The process of dereferencing a URI to retrieve a document is
often more interesting than it seems at first. On the web, it may
involve caches, proxies, and various forms of indirection.
Resolving a URI locally may involve
resolvers of various sorts and possibly appeal to
implementation-dependent
mechanisms such as catalog files.
In XProc, the situation is made even more interesting by the
fact that many intermediate results produced by steps in the
pipeline have base URIs.
Whether
(and when and how) or not the intermediate results that pass
between steps are ever written to a filesystem is
implementation-dependent
In Version 1.0 of XProc, how
(or if) implementers provide local resolution mechanisms and how
(or if) they provide access to intermediate results by URI is
implementation-defined
Version 1.0 of XProc does not require implementations to
guarantee that multiple attempts to dereference the same URI always
produce consistent results.
Note
On the one hand, this is a somewhat unsatisfying state of
affairs because it leaves room for interoperability problems. On
the other, it is not expected to cause such problems very often in
practice.
If these problems arise in practice, implementers are encouraged
to use the existing extension mechanisms to give users the control
needed to circumvent them. Should such mechanisms become
widespread, a standard mechanism could be added in some future
version of the language.
2.2.2 Non-XML
Documents
XProc is designed to allow pipeline authors to specify how an
XML document, or sequence of XML documents, flows through a series
of steps. For the most part, non-XML documents are considered
out-of-scope.
However, to be useful, XProc pipelines must interact with the
real world where non-XML documents (HTML documents, raster images,
non-XML encodings of data, etc.) are a fact of life.
Accordingly, some pipelines may need to access non-XML documents
and some non-XML documents may “leak” into pipelines. XProc
provides a limited set of tools for processing these documents. In
particular, XProc offers the ability to turn some “almost-XML”
documents into XML and to allow some non-XML documents to flow
quietly through the pipeline.
It
is not
a goal of XProc that it should be a
general-purpose pipeline language for manipulating arbitrary,
non-XML resources.
There are two standard ways that a non-XML document may enter a
pipeline: directly through
p:data
or as the result of performing an
p:http-request
step. Loading non-XML data
with a computed URI requires the
p:http-request
step. Implementors are encouraged to support the
file:
URI scheme so that users can load local data
from computed URIs.
In either case, non-XML documents are converted into text or are
base64-encoded, depending on their content type and character
encoding. The result is an XML document that consists of a document
element containing either escaped text or base64-encoded text. This
document can be processed like any other XML document.
The
p:unescape-markup
step can be used to
(attempt to) convert a non-XML document into XML. Well-formed XML
that just happens to be represented with escaped markup can always
be recovered. For other media types, the ability to construct XML
and the precise mechanisms used to make the markup well-formed are
implementation-defined
XProc provides no standard means to save encoded data in its
unencoded binary form. Implementors may provide extension methods
to allow the
p:store
step to save the binary data. For
example, an implementation might provide a
ext:binary
serialization method that decoded base64
encoded data before saving it:
2.3 Primary Inputs and Outputs
As a convenience for pipeline authors, each step may have one
input port designated as the primary input port and one output port
designated as the primary output port.
[Definition: If
a step has a document input port which is explicitly marked
primary='true'
”, or if it has exactly one
document input port and that port is
not
explicitly marked
primary='false'
”, then that input port is
the
primary input port
of the
step.]
If a step has a single input port and that port is
explicitly marked “
primary='false'
”, or if a
step has more than one input port and none is explicitly marked as
the primary, then the primary input port of that step is undefined.
A step can have at most one primary input port.
[Definition:
If a step has a document output port which is explicitly marked
primary='true'
”, or if it has exactly one
document output port and that port is
not
explicitly
marked “
primary='false'
”, then that output
port is the
primary output port
of the
step.]
If a step has a single output port and that port is
explicitly marked “
primary='false'
”, or if a
step has more than one output port and none is explicitly marked as
the primary, then the primary output port of that step is
undefined. A step can have at most one primary output port.
The special significance of primary input and output ports is
that they are connected automatically by the processor if no
explicit connection is given. Generally speaking, if two steps
appear sequentially in a subpipeline, then the primary output of
the first step will automatically be connected to the primary input
of the second.
Additionally, if a compound step has no declared outputs and the
last step
in
its subpipeline has an unconnected primary output, then an implicit
primary output port will be added to the compound step (and
consequently the last step's primary output will be connected to
it). This implicit output port has no name. It inherits the
sequence
property of the port
connected to it. This rule does not apply to
p:declare-step
step declarations must provide explicit names for all of their
outputs.
2.4 Connections
Steps are connected together by their input ports and output
ports.
It is a
static
error
err:XS0001
) if there are any loops in the
connections between steps: no step can be connected to itself nor
can there be any sequence of connections through other steps that
leads back to itself.
2.4.1 Namespace Fixup on Outputs
XProc processors are expected, and sometimes required, to
perform
namespace fixup
. Unless the
semantics of a step explicitly says otherwise:
The in-scope namespaces associated with a node (even those that
are inherited from namespace bindings that appear among its
ancestors in the document in which it appears initially) are
assumed to travel with that node.
Changes to one part of a tree (wrapping or unwrapping a node or
renaming an element, for example) do not change the in-scope
namespaces associated with the descendants of the node so
changed.
As a result, some steps can produce XML documents which have no
direct serialization (because they include nodes with conflicting
or missing namespace declarations, for example).
[Definition: To produce a
serializable
XML
document, the XProc processor must sometimes add additional
namespace nodes, perhaps even renaming prefixes, to satisfy the
constraints of
Namespaces in XML
. This process is
referred to as
namespace
fixup
.]
Implementors are encouraged to perform
namespace fixup
before passing documents between steps, but they are not required
to do so. Conversely, an implementation which
does
serialize between steps and therefore must perform such fixups, or
reject documents that cannot be serialized, is also conformant.
Except where the semantics of a step explicitly require changes,
processors are required to preserve the information in the
documents and fragments they manipulate. In particular, the
information corresponding to the [
Infoset
properties
[attributes]
[base URI]
[children]
[local name]
[namespace name]
[normalized
value]
[owner]
, and
[parent]
must
be
preserved.
The information corresponding to
[prefix]
[in-scope
namespaces]
[namespace attributes]
and
[attribute type]
should
be preserved, with changes to the first
three only as required for
namespace fixup
. In particular,
processors are encouraged to take account of prefix information in
creating new namespace bindings, to minimize negative impact on
prefixed names in content.
Except for cases which are
specifically called out in
Section 7, “Standard Step
Library”
, the extent to which namespace fixup, and other checks
for outputs which cannot be serialized, are performed on
intermediate outputs is
implementation-defined
Whenever an implementation serializes pipeline contents, for
example for pipeline outputs, logging, or as part of steps such as
p:store
or
p:http-request
, it is a
dynamic error
if that serialization could not be
done so as to produce a document which is both well-formed and
namespace-well-formed, as specified in
XML
and
Namespaces in
XML
, regardless of what serialization method, if any, is
called for.
2.5 Environment
[Definition: The
environment
is a context-dependent
collection of information available within subpipelines.]
Most of the information in the environment is static and can be
computed for each subpipeline before evaluation of the pipeline as
a whole begins. The in-scope bindings have to be calculated as the
pipeline is being evaluated.
The environment consists of:
A set of readable ports.
[Definition: The
readable
ports
are a set of step name/port name pairs.]
Inputs
and outputs can only be connected to readable ports.
A default readable port.
[Definition: The
default readable port
, which may be undefined, is
a specific step name/port name pair from the set of readable
ports.]
A set of in-scope bindings.
[Definition: The
in-scope bindings
are a set of name-value pairs,
based on
option
and
variable
bindings.]
[Definition: The
empty environment
contains no readable
ports, an undefined default readable port and no in-scope
bindings.]
Unless otherwise specified, the environment of a
contained step
is its
inherited environment
[Definition:
The
inherited environment
of a
contained
step
is an environment that is the same as the environment
of its
container
with the
standard modifications
The
standard
modifications
made to an inherited environment are:
The declared inputs of the container are added to the
readable
ports
In other words, contained steps can see the inputs to their
container.
The union of all the declared outputs of all of the step's
sibling steps are added to the
readable ports
In other words, sibling steps can see each other's outputs in
addition to the outputs visible to their container.
If there is a preceding sibling step element:
If that preceding sibling has a
primary output port
, then that
output port becomes the
default readable port
Otherwise, the
default readable port
is
undefined.
If there
is not
a preceding sibling step element:
If the container has a
primary input port
, the
default
readable port
is that
primary input port
Otherwise, the default readable port is unchanged.
The names and values from each
p:variable
present at the beginning of the
container are added, in document order, to the
in-scope
bindings
. A new binding replaces an old binding with the
same name. See
Section 5.7.1, “p:variable”
for the
specification of variable evaluation.
A step with no parent inherits the
empty
environment
2.6 XPaths
in XProc
XProc uses XPath as an expression language. XPath expressions
are evaluated by the XProc processor in several places: on compound
steps, to compute the default values of options and the values of
variables; on atomic steps, to compute the actual values of options
and the values of parameters.
XPath expressions are also passed to some steps. These
expressions are evaluated by the implementations of the individual
steps.
This distinction can be seen in the following example:
The select expression on the variable “
” is evaluated by the XProc processor. The value
of the variable is “
”.
The
href
option of the
p:load
step is evaluated
by the XProc processor. The actual
href
option received by the step is simply the string literal
”. (The
select expression on the
source
input of
the
p:split-sequence
step is also evaluated by
the XProc processor.)
The XPath expression “
@role='chapter'
is passed literally to the
test
option on
the
p:split-sequence
step. That's because the
nature of the
p:split-sequence
is that
it
evaluates
the expression. Only some options on some steps
expect XPath expressions.
The XProc processor evaluates all of the XPath expressions in
select
attributes on variables,
options, parameters, and inputs, in
match
attributes on
p:viewport
, and in
test
attributes on
p:when
steps.
An XProc implementation can use
either
XPath 1.0
] or [
XPath 2.0
] to evaluate
these expressions.
Note
Allowing either XPath 1.0 or XPath 2.0 is a compromise driven
entirely by the timing of XProc development. During the development
of this specification, the community indicated that it was too
early to mandate that all implementations use XPath 2.0 and too
late to mandate that all implementations use XPath 1.0.
Many, many expressions that are likely to be used in XProc
pipelines are the same in both versions (simple element tests,
ancestor and descendant tests, string-based attribute tests,
etc.).
As an aid to interoperability, pipeline authors may indicate the
version of XPath that they require. The attribute
xpath-version
may be used on
p:pipeline
p:declare-step
or
p:library
to identify the XPath version that
must
be used to evaluate XPath expressions on the
pipeline(s). The attribute is lexically scoped, but see below.
Note
In Version 1.0 of XProc, no similar level of control is provided
for specifying (or testing) the version of XSLT used when
evaluating
XSLTMatchPattern
s. The expectation
is that XPath 1.0 processors will be using XSLT 1.0 match patterns
and XPath 2.0 processors will be using XSLT 2.0 match patterns, but
that is not necessarily the case.
As XPath, XSLT, and XProc continue to evolve, additional
facilities for specifying and testing the version of XSLT used to
evaluate match patterns may be added to XProc.
If an
xpath-version
is specified
on a
p:pipeline
or
p:declare-step
then that is the version of XPath that the step requires. If it
does not specify a version, but a version is specified on one of
its ancestors, the nearest ancestor version specified is the
version that it requires. An
xpath-version
attribute on a
p:library
specifies a
default version for all steps defined in that library.
If no version is specified on the step
or among its ancestors, then its XPath version is
implementation-defined
Note
The decision about which XPath version applies can be made
dynamically. For example, if a pipeline explicitly labeled with
xpath-version
“1.0” imports a
library that does not specify a version, the implementation may
elect to make the implementation-defined XPath version of the steps
in the library also “1.0”. If the same implementation imports that
library into a pipeline explicitly labeled with
xpath-version
“2.0”, it can make the
implementation-defined version of those steps “2.0”.
The following rules determine how the indicated version and the
implementation's actual version interact:
If the indicated version and the implementation version are the
same, then that version is used.
If the indicated version is 1.0 and the implementation uses
XPath 2.0 (or later), the expression
must
be evaluated in XPath 1.0 compatibility mode.
It is a
dynamic
error
err:XD0024
) if a 2.0 processor encounters an
XPath 1.0 expression and it does not support XPath 1.0
compatibility mode.
Otherwise:
It is a
dynamic error
err:XD0027
) if the
processor encounters an
xpath-version
that it does not support.
XProc processors divide naturally into two classes: XPath 1.0
processors and XPath 2.0 processors.
Irrespective of which version of XPath is used, all expressions
evaluated by XProc or passed to steps for evaluation must be valid
XPath expressions.
It is a
dynamic error
err:XD0023
) if an
XPath expression is encountered which cannot be evaluated (because
it is syntactically incorrect, contains references to unbound
variables or unknown functions, or for any other reason).
2.6.1 XPath 1.0 processors
XProc processors that support only XPath 1.0 do not support any
types (or features) beyond those described in [
XPath 1.0
]. They use the
XPath 1.0 data model. Processors
must
implement all of the XPath 1.0 functions, but are not expected to
implement the functions described in [
XSLT 1.0
].
2.6.1.1 Processor XPath
Context
When the XProc processor evaluates an XPath expression using
XPath 1.0, unless otherwise indicated by a particular step, it does
so with the following initial context:
context node
The document node of a document. The document is either
specified with a
connection
or is taken from the
default
readable port
It is a
dynamic error
err:XD0008
) if a
document sequence appears where a document to be used as the
context node is expected.
If there is no explicit connection and there is no default
readable port then the context node is an empty document node.
context position and context size
The context position and context size are both “1”.
variable bindings
The union of the in-scope
specified options
and variables
are available as variable bindings to the XPath processor.
Note
An option that has neither a specified value nor a default value
will not appear as an in-scope variable. Consequently, an attempt
to refer to that variable will raise an error.
function library
The [
XPath 1.0
core function library and the
Section 2.7, “XPath
Extension Functions”
. Function names that do not contain a
colon always refer to the XPath 1.0 functions, any in-scope binding
for the default namespace
does not
apply.
in-scope namespaces
The namespace bindings in-scope on the element where the
expression occurred.
2.6.1.2 Step XPath Context
When
a step
evaluates an XPath expression using XPath
1.0, unless otherwise indicated by a particular step, it does so
with the following initial context:
context node
The document node that appears on the primary input port of the
step, unless otherwise specified by the step.
context position and context size
The position and size are both “1”, unless otherwise specified
by the step.
variable bindings
None, unless otherwise specified by the step.
function library
The [
XPath 1.0
core function library, unless otherwise specified by the step.
Function names that do not contain a colon always refer to the
XPath 1.0 functions, any in-scope binding for the default namespace
does not
apply.
in-scope namespaces
The set of namespace bindings provided by the XProc processor.
The processor computes this set of bindings by taking a union of
the bindings on the step element itself as well as the bindings on
any of the options and parameters used in computing values for the
step (see
Section 5.7.5,
“Namespaces on variables, options, and parameters”
).
The results of computing the
union of namespaces in the presence of conflicting declarations for
a particular prefix are
implementation-dependent
Note
Some steps may also provide for implementation-defined or
implementation-dependent amendments to the contexts. Those
amendments are in addition to any specified by XProc.
2.6.2 XPath 2.0 processors
XProc processors that support XPath 2.0 are XPath 2.0
processors. Such processors can refer to the primitive atomic
schema types, but cannot import additional types.
XPath 2.0 processors
must
implement
all of the XPath 2.0 functions, but are not expected to implement
the functions described in [
XSLT 2.0
].
2.6.2.1 Processor XPath
Context
When the XProc processor evaluates an XPath expression using
XPath 2.0, unless otherwise indicated by a particular step, it does
so with the following static context:
XPath 1.0 compatibility mode
Is true if the indicated XPath version is 1.0, false
otherwise.
Statically known namespaces
The namespace declarations in-scope for the containing
element.
Default element/type namespace
The null namespace.
Default function namespace
The [
XPath
2.0
] function namespace. Function names that do not
contain a colon always refer to the default function namespace, any
in-scope binding for the default namespace
does not
apply.
This specification does not provide a mechanism to override the
default function namespace.
In-scope schema definitions
A basic XPath 2.0 XProc processor includes the following named
type definitions in its in-scope schema definitions:
All the primitive atomic types defined in [
W3C XML Schema: Part
], with the exception of
xs:NOTATION
. That is:
xs:string
xs:boolean
xs:decimal
xs:double
xs:float
xs:date
xs:time
xs:dateTime
xs:duration
xs:QName
xs:anyURI
xs:gDay
xs:gMonthDay
xs:gMonth
xs:gYearMonth
xs:gYear
xs:base64Binary
, and
xs:hexBinary
The derived atomic type
xs:integer
defined in [
W3C XML
Schema: Part 2
].
The types
xs:anyType
xs:anySimpleType
xs:yearMonthDuration
xs:dayTimeDuration
xs:anyAtomicType
xs:untyped
, and
xs:untypedAtomic
defined in [
XQuery 1.0 and XPath 2.0
Data Model (XDM)
].
In-scope variables
The union of the in-scope
specified options
and variables
are available as variable bindings to the XPath processor.
Note
An option that has neither a specified value nor a default value
will not appear as an in-scope variable. Consequently, an attempt
to refer to that variable will raise an error.
Context item static type
Document.
Function signatures
The signatures of the [
XPath 2.0 Functions and Operators
] and the
Section 2.7, “XPath Extension
Functions”
Statically known collations
Implementation-defined but
must
include the Unicode code point collation.
The version of Unicode supported is
implementation-defined
, but
it is recommended that the most recent version of Unicode be
used.
Default collation
Unicode code point collation.
Base URI
The base URI of the element on which the expression occurs.
Statically known documents
None.
Statically known collections
None.
And the following dynamic context:
context item
The document node of a document. The document is either
specified with a
connection
or is taken from the
default
readable port
It is a
dynamic error
err:XD0008
) if a
document sequence appears where a document to be used as the
context node is expected.
If there is no explicit connection and there is no default
readable port then the context node is undefined.
context position and context size
The context position and context size are both “1”.
Variable values
The union of the in-scope options and variables are available as
variable bindings to the XPath processor.
Function implementations
The [
XPath 2.0
Functions and Operators
] and the
Section 2.7, “XPath Extension
Functions”
Current dateTime
The point in time returned as
the current dateTime is
implementation-defined
Implicit timezone
The implicit timezone is
implementation-defined
Available documents
The set of available
documents (those that may be retrieved with a URI) is
implementation-dependent
Available collections
The set of available
collections is
implementation-dependent
Default collection
None.
2.6.2.2 Step XPath Context
When a step evaluates an XPath expression using XPath 2.0,
unless otherwise indicated by a particular step, it does so with
the following static context:
XPath 1.0 compatibility mode
Is true if the indicated XPath version is 1.0, false
otherwise.
Statically known namespaces
The namespace declarations in-scope for the containing element
or made available through
p:namespaces
Default element/type namespace
The null namespace.
Default function namespace
The [
XPath
2.0
] function namespace. Function names that do not
contain a colon always refer to the default function namespace, any
in-scope binding for the default namespace
does not
apply.
This specification does not provide a mechanism to override the
default function namespace.
In-scope schema definitions
The same as the
Section 2.6.2.1, “Processor XPath
Context”
In-scope variables
None, unless otherwise specified by the step.
Context item static type
Document.
Function signatures
The signatures of the [
XPath 2.0 Functions and Operators
].
Statically known collations
Implementation-defined but
must
include the Unicode code point collation.
Default collation
Unicode code point collation.
Base URI
The base URI of the element on which the expression occurs.
Statically known documents
None.
Statically known collections
None.
And the following initial dynamic context:
context item
The document node of the document that appears on the primary
input of the step, unless otherwise specified by the step.
context position and context size
The context position and context size are both “1”, unless
otherwise specified by the step.
Variable values
None, unless otherwise specified by the step.
Function implementations
The [
XPath 2.0
Functions and Operators
].
Current dateTime
An implementation-defined point in time.
Implicit timezone
The implicit timezone is
implementation-defined
Available documents
The set of available
documents (those that may be retrieved with a URI) is
implementation-dependent
Available collections
None.
Default collection
None.
Note
Some steps may also provide for implementation-defined or
implementation-dependent amendments to the contexts. Those
amendments are in addition to any specified by XProc.
2.7 XPath Extension
Functions
The XProc processor
must
support
the additional functions described in this section in XPath
expressions evaluated by the processor.
In the following descriptions, the names of types (
string
boolean
, etc.) should be
taken to mean the corresponding [
W3C XML Schema: Part
] data types for an implementation that uses XPath 2.0
and as the most appropriate XPath 1.0 types for an XPath 1.0
implementation.
2.7.1 System Properties
XPath expressions within a pipeline document can interrogate the
processor for information about the current state of the pipeline.
Various aspects of the processor are exposed through the
p:system-property
function in the pipeline
namespace:
p:system-property
$property
as
xs:string
as
xs:string
The
$property
string must have the form
of a
QName
the QName is expanded into a name using the namespace declarations
in scope for the expression.
It is a
dynamic error
err:XD0015
) if the
specified QName cannot be resolved with the in-scope namespace
declarations. The
p:system-property
function returns the string
representing the value of the system property identified by the
QName. If there is no such property, the empty string
must
be returned.
Implementations
must
provide the
following system properties, which are all in the XProc
namespace:
p:episode
Returns a string which
should
be
unique for each invocation of the pipeline processor. In other
words, if a processor is run several times in succession, or if
several processors are running simultaneously, each invocation of
each processor should get a distinct value from
p:episode
The unique identifier must be a valid
XML name
p:language
Returns a string which identifies the current language, for
example, for message localization purposes.
The exact format of the language string is
implementation-defined
but
should
be consistent with the
xml:lang
attribute.
p:product-name
Returns a string containing the name of the implementation, as
defined by the implementer. This should normally remain constant
from one release of the product to the next. It should also be
constant across platforms in cases where the same source code is
used to produce compatible products for multiple execution
platforms.
p:product-version
Returns a string identifying the version of the implementation,
as defined by the implementer. This should normally vary from one
release of the product to the next, and at the discretion of the
implementer it may also vary across different execution
platforms.
p:vendor
Returns a string which identifies the vendor of the
processor.
p:vendor-uri
Returns a URI which identifies the vendor of the processor.
Often, this is the URI of the vendor's web site.
p:version
Returns the version(s) of XProc implemented by the processor as
a space-separated list. For example, a processor that supports
XProc 1.0 would return “1.0”; a processor that supports XProc 1.0
and 2.0 would return “1.0 2.0”; a processor that supports only
XProc 2.0 would return “2.0”.
p:xpath-version
Returns the version(s) of XPath implemented by the processor for
evaluating XPath expressions on XProc elements. The result is a
space-separated list of versions supported. For example, a
processor that only supports XPath 1.0 would return “1.0”; a
processor that supports XPath 2.0 and XPath 1.0 backwards
compatibility mode could return “1.0 2.0”; a processor that
supports only XPath 2.0 would return “2.0”.
p:psvi-supported
Returns true if the implementation supports passing PSVI
annotations between steps, false otherwise.
Implementations may support additional system properties but
such properties
must
be in a namespace
and
must not
be in the XProc
namespace.
2.7.2 Step Available
The
p:step-available
function reports whether or
not a particular type of step is understood by the processor.
p:step-available
$step-name
as
xs:string
as
xs:boolean
The
$step-type
string
must
have the form of a QName; the QName is
expanded into a name using the namespace declarations in-scope for
the expression. The
p:step-available
function returns true if and
only if the processor knows how to evaluate steps of the specified
type.
2.7.3 Value Available
The
p:value-available
function reports whether or
not a particular in-scope option has a value.
p:value-available
$option-name
as
xs:string
as
xs:boolean
p:value-available
$option-name
as
xs:string
$fail-if-unknown
as
xs:boolean
as
xs:boolean
The
$option-name
string
must
have the form of a QName; the QName is
expanded into a name using the namespace declarations in-scope for
the expression. The
p:value-available
function returns true if and
only if the name specified is the name of an
in-scope
binding
and the binding has a value.
It is a
dynamic
error
err:XD0033
) if the name specified is not the
name of an in-scope option or variable.
In the two-argument form, it is not an error to specify a name
that is not the name of an in-scope option or variable if
$fail-if-unknown
is false; the function
simply returns false. The semantics of the two-argument form when
$fail-if-unknown
is true are precisely the
same as the single argument form.
Consider the following example:
If the
path
option is specified in the
call to
ex:dir-list
, then the first
p:when
clause will be
evaluated and the specified value will be used. If the option is
not specified, then the
p:otherwise
clause will be evaluated and
" will be used instead.
2.7.4 Iteration Position
Both
p:for-each
and
p:viewport
process a
sequence of documents. The iteration position is the position of
the current document in that sequence: the first document has
position 1, the second 2, etc. The
p:iteration-position
function returns the
iteration position of the nearest ancestor
p:for-each
or
p:viewport
p:iteration-position
as
xs:integer
If there is no
p:for-each
or
p:viewport
among the
ancestors of the element on which the expression involving
p:iteration-position
occurs, it returns 1.
2.7.5 Iteration Size
Both
p:for-each
and
p:viewport
process a
sequence of documents. The iteration size is the total number of
documents in that sequence. The
p:iteration-size
function returns the iteration size of the nearest ancestor
p:for-each
or
p:viewport
p:iteration-size
as
xs:integer
If there is no
p:for-each
or
p:viewport
among the
ancestors of the element on which the expression involving
p:iteration-size
occurs, it returns 1.
2.7.6 Base
URI
Returns the base URI of the specified node, if it has one. This
function provides an interoperable way for XPath 1.0 based
processors to access the base URI of a node. It is conceptually the
same as the XPath 2.0
fn:base-uri()
function.
p:base-uri
as
xs:string
p:base-uri
$node
as
node()
as
xs:string
If no argument is specified, the context node is taken to be the
argument.
This function returns the
[base-uri]
property of its argument, or the empty string if no base URI is
defined for that argument or argument type.
Note
This function is defined in our namespace because it would be
inappropriate to require XPath 1.0 based processors to support the
fn:base-uri
function; its semantics are
deeply rooted in the XPath 2.0 data model which differs from the
XPath 1.0 data model.
2.7.7 Resolve URI
Resolves a relative URI with respect to a particular base URI.
This function provides an interoperable way for XPath 1.0 based
processors to compose URI references. It is conceptually the same
as the XPath 2.0
fn:resolve-uri()
function.
p:resolve-uri
$relative
as
xs:string
as
xs:string
p:resolve-uri
$relative
as
xs:string
$base
as
xs:string
as
xs:string
If no base is specified, the base URI of the context node is
used.
Note
This function is defined in our namespace because it would be
inappropriate to require XPath 1.0 based processors to support the
fn:resolve-uri
function; its semantics are
rooted in the XPath 2.0 data model which differs from the XPath 1.0
data model.
2.7.8 Version Available
Returns true if and only if the processor supports the version
specified.
p:version-available
$version
as
xs:decimal
as
xs:boolean
A version 1.0 processor will return
true()
when
p:version-available(1.0)
is evaluated.
2.7.9 XPath Version
Available
Returns true if and only if the processor supports the XPath
version specified.
p:xpath-version-available
$version
as
xs:decimal
as
xs:boolean
A processor that supports XPath 2.0 will return
true()
when
p:xpath-version-available(2.0)
is evaluated.
2.7.10 Other XPath
Extension Functions
It is
implementation-defined
if the
processor supports any other XPath extension functions. Additional
extension functions, if any,
must not
use any of the XProc namespaces.
2.8 PSVIs in
XProc
XML documents flow between steps in an XProc pipeline.
Section A.3, “Infoset Conformance”
identifies the properties of those documents that
must
be available. Implementations
may
also have the ability to pass PSVI annotations
between steps.
Whether or not the pipeline
processor supports passing PSVI annotations between steps is
implementation-defined
The exact PSVI properties that
are preserved when documents are passed between steps is
implementation-defined
A pipeline can use the
p:psvi-supported
system property to determine whether or not PSVI properties can be
passed between steps.
A pipeline can assert that PSVI support is required with the
psvi-required
attribute:
On a
p:pipeline
or
p:declare-step
psvi-required
indicates whether or
not the declared step requires PSVI support.
It is a
dynamic
error
err:XD0022
) if a processor that does not
support PSVI annotations attempts to invoke a step which asserts
that they are required.
On a
p:library
, the
psvi-required
attribute provides a default
value for all of its
p:pipeline
and
p:declare-step
children
that do not specify a value themselves.
Many of the steps that an XProc pipeline can use are
transformative in nature. The
p:delete
step, for example, can remove
elements and attributes; the
p:label-elements
step can add attributes;
etc. If PSVI annotations were always preserved, the use of such
steps could result in documents that were inconsistent with their
schema annotations.
In order to avoid these inconsistencies, most steps
must not
produce PSVI annotated results even when
PSVI passing is supported.
If PSVI passing is supported, the following constraints
apply:
Implementations
must
faithfully
transmit any PSVI properties produced on step outputs to the steps
to which they are connected.
When only a subset of the input is processed by a step (because
select
expression appears on an
input port or a
match
expression is
used to process only part of the input), any PSVI annotations that
appear on the selected input
must
be
preserved in the resulting documents passed to the step.
Note that ID/IDREF constraints, and any other whole-document
constraints, may not be satisfied within the selected portion,
irrespective of what its PSVI properties claim.
If an output of a compound step is connected to an output which
includes PSVI properties, those properties
must
be preserved on the output of the compound
step,
except
for the output of
p:viewport
which
must not
contain any PSVI
properties.
If an implementation supports XPath 2.0, the data model
constructed with which to evaluate XPath expressions and match
patterns
should
take advantage of as
much PSVI information as possible.
Except as specified above, or in the descriptions of individual
steps, implementations
must not
include PSVI properties in the outputs of steps defined by this
specification.
It is
implementation-defined
what
PSVI properties, if any, are produced by extension steps.
The exceptions in the standard XProc steps are the
p:validate-with-xml-schema
p:validate-with-relax-ng
, and
p:validate-with-schematron
steps,
p:xslt
(when XSLT 2.0 is
used),
p:xquery
p:identity
, and
p:split-sequence
Note
A processor that supports passing PSVI properties between steps
is always free to do so. Even if
psvi-required="false"
is explicitly specified, it is
not an error for a step to produce a result that includes
additional PSVI properties, provide it does not violate the
constraints above.
2.9 Variables
Variables are name/value pairs. Pipeline authors can create
variables to hold computed values.
[Definition: A
variable
is a name/value pair where the name is an
expanded
name
and the value
must
be a
string or
xs:untypedAtomic
.]
Variables and options share the same scope and may shadow each
other.
2.10 Options
Some steps accept options. Options are name/value pairs, like
variables. Unlike variables, the value of an option can be changed
by the caller.
[Definition: An
option
is a name/value pair where the name is an
expanded
name
and the value
must
be a
string or
xs:untypedAtomic
.]
[Definition: The
options declared on a step are its
declared
options
.]
Option names are always expressed as literal
values, pipelines cannot construct option names dynamically.
[Definition: The
options on a step which have specified values, either because a
p:with-option
element specifies a value or
because the declaration included a default value, are its
specified options
.]
How outside values are
specified for pipeline options on the pipeline initially invoked by
the processor is
implementation-defined
. In
other words, the command line options, APIs, or other mechanisms
available to specify such options values are outside the scope of
this specification.
2.11 Parameters
Some steps accept parameters. Parameters are name/value pairs,
like variables and options. Unlike variables and options, which
have names known in advance to the pipeline, parameters are not
declared and their names may be unknown to the pipeline author.
Pipelines can dynamically construct sets of parameters. Steps can
read dynamically constructed sets on
parameter input ports
[Definition: A
parameter
is a name/value pair where the
name is an
expanded name
and the value
must
be a string or
xs:untypedAtomic
.]
[Definition:
parameter input port
is a
distinguished kind of input port which accepts (only) dynamically
constructed parameter name/value pairs.]
See
Section 5.1.2,
“Parameter Inputs”
Analogous to
primary input ports
, steps that
have parameter inputs may designate at most one parameter input
port as a primary parameter input port.
[Definition: If a step has a
parameter input port which is explicitly marked “
primary='true'
”, or if it has exactly one parameter
input port and that port is
not
explicitly marked
primary='false'
”, then that parameter input
port is the
primary parameter input port
of the step.]
If a step has a single parameter input port
and that port is explicitly marked “
primary='false'
”, or if a step has more than one
parameter input port and none is explicitly marked as the primary,
then the primary parameter input port of that step is
undefined.
How outside values are
specified for pipeline parameters on the pipeline initially invoked
by the processor is
implementation-defined
. In
other words, the command line options, APIs, or other mechanisms
available to specify such parameter values are outside the scope of
this specification.
2.12 Security
Considerations
An XProc pipeline may attempt to access arbitrary network
resources: steps such as
p:load
and
p:http-request
can attempt to read from an arbitrary URI; steps such as
p:store
can attempt to
write to an arbitrary location;
p:exec
can attempt to execute an arbitrary
program. Note, also, that some steps, such as
p:xslt
and
p:xquery
, include
extension mechanisms which may attempt to execute arbitrary
code.
In some environments, it may be inappropriate to provide the
XProc pipeline with access to these resources. In a server
environment, for example, it may be impractical to allow pipelines
to store data. In environments where the pipeline cannot be
trusted, allowing the pipeline to access arbitrary resources or
execute arbitrary code may be a security risk.
It is a
dynamic
error
err:XD0021
) for a pipeline to attempt to
access a resource for which it has insufficient privileges or
perform a step which is forbidden.
Which steps are forbidden, what privileges are needed
to access resources, and under what circumstances these security
constraints apply is
implementation-dependent
Steps in a pipeline may call themselves recursively which could
result in pipelines which will never terminate.
A conformant XProc processor may limit the resources available
to any or all steps in a pipeline. A conformant implementation may
raise dynamic errors, or take any other corrective action, for any
security problems that it detects.
2.13 Versioning
Considerations
A pipeline author
may
identify the
version of XProc for which a particular pipeline was authored by
setting the
version
attribute. The
version
attribute can be specified
on
p:declare-step
p:pipeline
, or
p:library
. If
specified, the value of the
version
attribute
must
be a
xs:decimal
It is a
static error
err:XS0063
) if the
value of the
version
attribute is
not a
xs:decimal
The version of XProc defined by this specification is
1.0
”.
A pipeline author
must
identify the
version of XProc on the document element of a pipeline document.
It is a
static
error
err:XS0062
) if a required
version
attribute is not present.
The version identified applies to the element on which the
version
attribute appears and all of
its descendants, unless or until another version is explicitly
identified.
When a processor encounters an explicit version (other than a
version which it implements), it proceeds in backwards- or
forwards-compatible mode.
2.13.1 Backwards-compatible Mode
If the processor encounters a request for a previous version of
XProc (e.g, if a "2.0" processor encounters an explicit request for
the "1.0" language), it
must
process
the pipeline as if it was a processor for the requested version: it
must
enforce the semantics of the
requested version, it
must
report
steps not known in that version as errors, etc.
It is a
static
error
err:XS0060
) if the processor encounters an
explicit request for a previous version of the language and it is
unable to process the pipeline using those semantics.
2.13.2 Forwards-compatible Mode
If the processor encounters an explicit version which it does
not recognize, it processes the pipeline in forwards-compatible
mode. Forwards-compatible mode relaxes several static errors,
turning them into dynamic errors so that a pipeline author can
write a pipeline which conditionally uses new language
features.
In forwards-compatible mode:
On any element in the XProc namespace, unrecognized attributes
(other than extension attributes) are ignored.
On any step in the XProc namespace, unknown options are
ignored.
If a step in the XProc namespace includes an unknown input port
with an explicit connection, the connection is treated normally for
the purpose of computing the dependencies in the pipeline but it is
otherwise ignored. Unknown input ports
must
not
be treated as
primary input ports
; it will
always be an error if they are used but not explicitly
connected.
If a step in the pipeline includes an explicit connection to an
unknown output port on a step in the XProc namespace, the
connection is treated normally for the purpose of computing the
dependencies in the pipeline. An empty sequence of documents
must
appear on that connection.
As a consequence of the rules above, future specifications
must not
change the semantics of
existing step types without changing their names. Although they may
add new input and output ports, such changes should be done with
care; they
should
in some sense be
limited to ancillary inputs and outputs and they
must not
be
primary input ports
2.13.2.1 Examples
In forwards-compatible mode, it is not a static error to
encounter the following step:
The processor will simply ignore the “
ancillary
” port.
Suppose that XProc version 2.0 changes the definition of the
p:xslt
step so
that it has an additional output port,
messages
. Then consider the following pipeline:
When run by a "2.0" or later processor, it will count the
documents that appear on the
messages
port.
When run by a “1.0” processor in forwards-compatible mode, the
binding to the “
messages
” port is not a
static error. Dynamically, the "1.0" processor will always produce
a count of zero, because an empty sequence of documents will always
appear on the
messages
port.
3 Syntax Overview
This section describes the normative XML syntax of XProc. This
syntax is sufficient to represent all the aspects of a pipeline, as
set out in the preceding sections.
[Definition: XProc is intended to work equally well with
XML 1.0
] and
XML 1.1
]. Unless
otherwise noted, the term “
XML
” refers
equally to both versions.]
[Definition: Unless otherwise noted, the
term
Namespaces in XML
refers equally to
Namespaces 1.0
and [
Namespaces
1.1
].]
Support
for pipeline documents written in XML 1.1 and pipeline inputs and
outputs that use XML 1.1 is
implementation-defined
Elements in a pipeline document represent the pipeline, the
steps it contains, the connections between those steps, the steps
and connections contained within them, and so on. Each step is
represented by an element; a combination of elements and attributes
specify how the inputs and outputs of each step are connected and
how options and parameters are passed.
Conceptually, we can speak of steps as objects that have inputs
and outputs, that are connected together and which may contain
additional steps. Syntactically, we need a mechanism for specifying
these relationships.
Containment
is represented naturally using
nesting of XML elements. If a particular element identifies a
compound
step
then the step elements that are its immediate
children form its
subpipeline
The connections between steps are expressed using names and
references to those names.
Six kinds of things are named in XProc:
Step types,
Steps,
Input ports (both parameter and document),
Output ports,
Options and variables, and
Parameters
3.1 XProc
Namespaces
There are three namespaces associated with XProc:
The namespace of the XProc XML vocabulary described by this
specification; by convention, the namespace prefix “
p:
” is used for this namespace.
The namespace used for documents that are inputs to and outputs
from several standard and optional steps described in this
specification. Some steps, such as
p:http-request
and
p:store
have defined input or output vocabularies. We use this namespace
for all of those documents. The conventional prefix “
c:
” is used for this namespace.
The namespace used for errors. The conventional prefix
err:
” is used for this namespace.
This specification also makes use of the prefix “
xs:
” to refer to the [
W3C XML Schema: Part
] namespace
3.2 Scoping of
Names
Names are used to identify step types, steps, ports, options and
variables, and parameters. Step types, options, variables, and
parameters are named with QNames. Steps and ports are named with
NCNames. The scope of a name is a measure of where it is available
in a pipeline.
[Definition:
If two names are in the same scope, we say that they are
visible
to each other. ]
The scope of the names of the step types is the pipeline in
which they are declared, including any declarations imported from
libraries via
p:import
. Nested pipelines inherit the step
types in scope for their parent.
In other words, the step types that are in scope in a
p:pipeline
or
p:declare-step
are:
The standard, built-in types (
p:pipeline
p:choose
, etc.).
Any implementation-provided types.
Any step types declared in the pipeline (the
p:pipeline
and
p:declare-step
children of the pipeline
element).
The types of any
p:pipeline
s or
p:declare-step
that are imported.
Any types that are in the scope of any
p:library
that is
imported.
Any step types that are in scope for the pipeline's parent
p:pipeline
or
p:declare-step
, if it has one.
The type of the pipeline itself, if it has one.
The step types that are in scope in a
p:library
are:
The standard, built-in types (
p:pipeline
p:choose
, etc.).
Any implementation-provided types.
Any step types declared in the library (the
p:pipeline
and
p:declare-step
children of the
p:library
element).
The types of
p:pipeline
s or
p:declare-step
that are imported into the library.
Any types that are in the scope of any
p:library
that is
imported.
All the
step types in a pipeline or library
must
have unique names: it is a
static
error
err:XS0036
) if any step type name is built-in
and/or declared or defined more than once in the same scope.
The scope of the names of the steps themselves is determined by
the
environment
of each step. In general,
the name of a step, the names of its sibling steps, the names of
any steps that it contains directly, the names of its ancestors,
and the names of the siblings of its ancestors are all in a common
scope.
All
steps in the same scope
must
have
unique names: it is a
static error
err:XS0002
) if two
steps with the same name appear in the same scope.
The scope of an input or output port name is the step on which
it is defined. The names of all the ports on any step
must
be unique.
Taken together, these uniqueness constraints guarantee that the
combination of a step name and a port name uniquely identifies
exactly one port on exactly one in-scope step.
The scope of option and variable names is determined by where
they are declared. When an option is declared with
p:option
(or a
variable with
p:variable
), unless otherwise specified, its
scope consists of the sibling elements that follow its declaration
and the descendants of those siblings.
It is a
static error
err:XS0004
) if an
option or variable declaration duplicates the name of any other
option or variable in the same
environment
. That is, no option or
variable may lexically shadow another option or variable with the
same name.
Parameter names are not scoped; they are distinct on each
step.
3.3 Base URIs and xml:base
When a relative URI appears in an option value, the base URI
against which it
must
be made absolute
is the base URI of the
p:option
element. If an option value is
specified using a
syntactic
shortcut
, the base URI of the step on which the shortcut
attribute appears
must
be used. In
general, whenever a relative URI appears, its base URI is the base
URI of the nearest ancestor element.
The pipeline author can control the base URIs of elements within
the pipeline document with the
xml:base
attribute. The
xml:base
attribute
may
appear on any element in a pipeline and has
the semantics outlined in [
XML Base
].
3.4 Unique identifiers
A pipeline author can provide a globally unique identifier for
any element in a pipeline with the
xml:id
attribute.
The
xml:id
attribute
may
appear on any element in a pipeline and has
the semantics outlined in [
xml:id
].
3.5 Associating Documents with
Ports
[Definition: A
connection
associates an input or output
port with some data source.]
A document or a sequence of
documents can be connected to a port in four ways:
by source
by URI
, by providing an
inline
document
, or by making it
explicitly empty
. Each of these
mechanisms is allowed on the
p:input
p:output
p:xpath-context
p:iteration-source
, and
p:viewport-source
elements.
Specified by URI
[Definition: A document is
specified
by URI
if it is referenced
with a URI.]
The
href
attribute on the
p:document
or
p:data
element is used to refer to documents
by URI.
In this example, the input to the
p:identity
step
named “
otherstep
” comes from “
”.
Specified by source
[Definition: A document
is specified
by source
if it references
a specific port on another step.]
The
step
and
port
attributes on the
p:pipe
element are used for this
purpose.
In this example, the “
source
” input to
the
p:xinclude
step named “
expand
” comes from the “
result
” port of the step named “
otherstep
”.
See the description of
p:pipe
for a complete description of the
ports that can be connected.
Specified inline
[Definition: An
inline document
is specified directly in
the body of the element to which it connects.]
The content
of the
p:inline
element is used for this
purpose.
In this example, the “
stylesheet
” input
to the XSLT step named “
xform
” comes from
the content of the
p:input
element itself.
...
Inline documents are considered “quoted”. The pipeline processor
passes them literally to the port, even if they contain elements
from the XProc namespace or other namespaces that would have other
semantics outside of the
p:inline
Specified explicitly empty
[Definition: An
empty sequence
of documents is specified
with the
p:empty
element.]
In this example, the “
source
” input to
the XSLT 2.0 step named “
generate
” is
explicitly empty:
...
If you omit the connection on a primary input port, a connection
to the
default readable port
will be
assumed. Making the connection explicitly empty guarantees that the
connection will be to an empty sequence of documents.
It is inconsistent with the [
XPath 1.0
] specification
to specify an empty connection as the context for evaluating an
XPath expression. When an empty connection is specified for an
XPath 1.0 expression, an empty document node
must
be used instead as the context node.
Note that a
p:input
or
p:output
element may contain more than one
p:pipe
p:document
p:data
, or
p:inline
element. If
more than one
connection
is provided, then the
specified sequence of documents is made available on that port in
the same order as the connections.
3.6 Documentation
Pipeline authors may add documentation to their pipeline
documents with the
p:documentation
element. Except when it
appears as a descendant of
p:inline
, the
p:documentation
element is completely ignored by pipeline processors, it exists
simply for documentation purposes. If a
p:documentation
is provided as a descendant of
p:inline
, it has no special semantics, it is
treated literally as part of the document to be provided on that
port. The
p:documentation
element has no special
semantics when it appears in documents that flow through the
pipeline.
Pipeline processors that inspect the contents of
p:documentation
elements and behave differently on the basis of what they find are
not conformant
. Processor extensions
must
be specified with
p:pipeinfo
3.7 Processor
annotations
Pipeline authors may add annotations to their pipeline documents
with the
p:pipeinfo
element.
The semantics of
p:pipeinfo
elements are
implementation-defined
Processors
should
specify a way for
their annotations to be identified, perhaps with
extension attributes
Where
p:documentation
is intended for human
consumption,
p:pipeinfo
elements are intended for
processor consumption. A processor might, for example, use
annotations to identify some particular aspect of an
implementation, to request additional, perhaps non-standard
features, to describe parallelism constraints, etc.
When a
p:pipeinfo
appears as a descendant of
p:inline
, it
has no special semantics; in that context it
must
be treated literally as part of the document
to be provided on that port. The
p:pipeinfo
element has no special semantics
when it appears in documents that flow through the pipeline.
3.8 Extension attributes
[Definition:
An element from the XProc namespace
may
have any attribute not from the XProc
namespace, provided that the expanded-QName of the attribute has a
non-null namespace URI. Such an attribute is called an
extension attribute
.]
The presence of an extension attribute must not cause the
connections between steps to differ from the connections that would
arise in the absence of the attribute. They must not cause the
processor to fail to signal an error that would be signaled in the
absence of the attribute.
A processor which encounters an extension attribute that it does
not implement
must
behave as if the
attribute was not present.
3.9 Conditional
Element Exclusion
Any element in the XProc namespace may have a
use-when
attribute which
must
contain an XPath expression that can be
evaluated statically. If the attribute is present and the effective
boolean value of the expression is false, then the element and all
of its descendants are effectively excluded from the pipeline
document. If a node is effectively excluded, the processor
must
behave as if the element was not
present in the document.
Elements that are not in the XProc namespace
may
also have a
use-when
attribute, but the attribute must be
in the XProc namespace. The semantics of a
p:use-when
attribute on an element not in the
XProc namespace are the same as the semantics of a
use-when
attribute on an element in the XProc
namespace.
Conditional element exclusion occurs before any static analysis
of the pipeline.
Note
The effective exclusion of
use-when
processing occurs after XML parsing
and has no effect on well-formedness or validation errors which
will be reported in the usual way. Note also that
use-when
is not performed when it occurs on
the descendant of a
p:inline
element.
For the purposes of evaluating a
use-when
expression, the context node,
position, and size are all undefined. No
in-scope
bindings
are available. There are no readable ports. There
are no available documents or available collections.
There are some additional restrictions on the XPath extension
functions that are available in a
use-when
expression:
The
p:episode
system property
should not
be used.
The value of the
p:episode
system property in a
use-when
expression is
implementation-dependent
The
p:step-available
function cannot be used to
test for the availability of extension steps (because the libraries
that declare them may not have been imported).
The results of testing for steps not in the XProc
namespace in a
use-when
expression
are
implementation-dependent
The steps available and possibly other aspects of the expression
may depend on the version specified for a pipeline, see
Section 2.13, “Versioning
Considerations”
. For example, in a “1.0” pipeline, the
processor
should not
report that “2.0”
steps are available.
It is a
static
error
err:XS0061
) if a
use-when
expression refers to the context or
attempts to refer to any documents or collections.
3.10 Syntax Summaries
The description of each element in the pipeline namespace is
accompanied by a syntactic summary that provides a quick overview
of the element's syntax:
some-type
some
elements
allowed
)*,
other-elements?
The content model fragments in these tableaux are presented in a
simple, compact notation. In brief:
A name represent exactly one occurrence of an element with that
name.
Parentheses are used for grouping.
Elements or groups separated by a comma (“,”) represent an
ordered sequence: a followed by b followed by c: (a,b,c).
Elements or groups separated by a vertical bar (“|”) represent a
choice: a or b or c: (a | b | c).
Elements or groups separated by an ampersand (“&”) represent
an unordered sequence: a and b and c, in any order: (a & b
& c).
An element or group followed by a question mark (“?”) is
optional; it may or may not occur but if it occurs it can occur
only once.
An element or group followed by an asterisk (“*”) is optional
and may be repeated; it may or may not occur and if it occurs it
can occur any number of times.
An element or group followed by a plus (“+”) is required and may
be repeated; it must occur at least once, and it can occur any
number of times.
For clarity of exposition, some attributes and elements are
elided from the summaries:
An
xml:id
attribute is allowed on
any element. It has the semantics of [
xml:id
].
An
xml:base
attribute is allowed
on any element. It has the semantics of [
XML Base
].
use-when
attribute is allowed
on any element, see
Section 3.9, “Conditional
Element Exclusion”
The
p:documentation
and
p:pipeinfo
elements
are not shown, but they are allowed anywhere.
The
p:log
element is allowed on any step that has a
p:output
Attributes that are
syntactic
shortcuts for option values
are not shown.
The types given for attributes should be understood as
follows:
ID
NCName
NMTOKEN
NMTOKENS
anyURI
boolean
integer
string
: As per
W3C XML Schema: Part
] including whitespace normalization as
appropriate.
QName
: With whitespace normalization as
per [
W3C XML Schema:
Part 2
] and according to the following definition: In
the context of XProc, a
QName
is almost
always a QName in the
Namespaces in XML
sense. Note,
however, that
p:option
and
p:with-param
values can get their namespace declarations in a non-standard way
(with
p:namespaces
) and QNames that have no prefix
are always in no-namespace, irrespective of the default
namespace.
PrefixList
: As a list with
[item type]
NMTOKEN
, per
W3C XML Schema: Part
], including whitespace normalization.
XPathExpression
XSLTMatchPattern
: As a string per [
W3C XML Schema: Part
], including whitespace normalization, and the further
requirement to be a conformant Expression per [
XPath 1.0
] or [
XPath 2.0
], as
appropriate, or Match pattern per [
XSLT 1.0
] or [
XSLT 2.0
], as appropriate.
3.11 Common
errors
A number of errors apply generally:
It is a
static
error
err:XS0059
) if the pipeline element is not
p:pipeline
p:declare-step
, or
p:library
It is a
static
error
err:XS0008
) if any element in the XProc
namespace has attributes not defined by this specification unless
they are
extension attributes
It is a
static
error
err:XS0038
) if any required attribute is not
provided.
It is a
dynamic
error
err:XD0028
) if any attribute value does not
satisfy the type required for that attribute.
It is a
static
error
err:XS0044
) if any element in the XProc
namespace or any step has element children other than those
specified for it by this specification. In particular, the presence
of atomic steps for which there is no visible declaration may raise
this error.
It is a
static
error
err:XS0037
) if any step directly contains
text nodes that do not consist entirely of whitespace.
It is a
dynamic
error
err:XD0019
) if any option value does not
satisfy the type required for that option.
It is a
static
error
err:XS0015
) if a compound step has no
contained
steps
It is a
dynamic
error
err:XD0012
) if any attempt is made to
dereference a URI where the scheme of the URI reference is not
supported. Implementations are encouraged to support as many
schemes as is practical and, in particular, they
should
support both the
file:
and
http(s):
schemes.
The set of URI schemes actually
supported is
implementation-defined
It is a
dynamic
error
err:XD0030
) if a step is unable or incapable
of performing its function. This is a general error code for “step
failed” (e.g., if the input isn't of the expected type or if
attempting to process the input causes the implementation to
abort). Users and implementors who create extension steps are
encouraged to use this code for general failures.
In most steps which use a select expression or match pattern,
any kind of node can be identified by the expression or pattern.
However, some expressions and patterns on some steps are only
applicable to some kinds of nodes (e.g., it doesn't make sense to
speak of adding attributes to a comment!).
It is a
dynamic
error
err:XC0023
) if a select expression or match
pattern returns a node type that is not allowed by the step.
If an XProc processor can determine statically that a dynamic
error will
always
occur, it
may
report that error statically provided that the
error
does not
occur among the descendants of a
p:try
. Dynamic errors
inside a
p:try
must not
be reported statically. They
must be raised dynamically so that
p:catch
processing can be performed on
them.
4 Steps
This section describes the core steps of XProc.
Several of the steps defined in this specification refer to
other, evolving XML technologies (XSLT, XQuery, XSL-FO, etc.).
Where this specification identifies a specific version of a
technology, implementors
must
implement the specified version or any subsequent edition or
version that is backwards compatible. At user option, they may
support other, incompatible versions or extensions.
4.1 p:pipeline
p:pipeline
declares a pipeline
that can be evaluated by an XProc processor. It encapsulates the
behavior of a
subpipeline
. Its children declare
inputs, outputs, and options that the pipeline exposes and identify
the steps in its subpipeline. (A
p:pipeline
is a simplified form of
step declaration
.)
All
p:pipeline
pipelines have an
implicit
primary input port
named
source
”, an implicit
primary
parameter input port
named “
parameters
”, and an implicit
primary output
port
named “
result
”. Any input or
output ports that the
p:pipeline
declares explicitly are
in addition
to those ports and may
not be declared primary.
NCName
type? =
QName
psvi-required? =
boolean
xpath-version? =
string
exclude-inline-prefixes? =
prefix list
version? =
string
p:input
p:output
p:option
p:log
p:serialization
)*,
p:declare-step
p:pipeline
p:import
)*,
subpipeline
Viewed from the outside, a
p:pipeline
is a black box which performs some
calculation on its inputs and produces its outputs. From the
pipeline author's perspective, the computation performed by the
pipeline is described in terms of
contained steps
which read the
pipeline's inputs and produce the pipeline's outputs.
The
version
attribute identifies
the version of XProc for which this pipeline was authored. If the
p:pipeline
has no ancestors in the
XProc namespace, then it
must
have a
version
attribute. See
Section 2.13, “Versioning
Considerations”
If a pipeline does not have a
type
then that pipeline cannot be invoked as a
step.
The
p:pipeline
element is just a
simplified form of step declaration. A document that reads:
some-content
can be interpreted as if it read:
some-content
See
p:declare-step
for more details.
4.1.1 Example
A pipeline might accept a document as input; perform XInclude,
validation, and transformation; and produce the transformed
document as its output.
Example 4. A Sample Pipeline
Document
4.2 p:for-each
A for-each is specified by the
p:for-each
element. It is a
compound step
that
processes a sequence of documents, applying its
subpipeline
to each
document in turn.
NCName
((
p:iteration-source
? &
p:output
p:log
)*),
subpipeline
When a pipeline needs to process a sequence of documents using a
subpipeline that only processes a single document, the
p:for-each
construct can be used as a wrapper
around that subpipeline. The
p:for-each
will apply that subpipeline to each
document in the sequence in turn.
The result of the
p:for-each
is a
sequence of documents produced by processing each individual
document in the input sequence. If the
p:for-each
has one or more output ports, what
appears on each of those ports is the sequence of documents that is
the concatenation of the sequence produced by each iteration of the
loop on the port to which it is connected. If the iteration source
for a
p:for-each
is an empty sequence,
then the subpipeline is never run and an empty sequence is produced
on all of the outputs.
The
p:iteration-source
is an anonymous input:
its
connection
provides a sequence of
documents to the
p:for-each
step. If
no iteration sequence is explicitly provided, then the iteration
source is read from the
default readable port
The processor provides each document, one at a time, to the
subpipeline
represented by the children
of the
p:for-each
on a port named
current
For each declared output, the processor collects all the
documents that are produced for that output from all the
iterations, in order, into a sequence. The result of the
p:for-each
on that output is that sequence of
documents.
The environment inherited by the
contained steps
of a
p:for-each
is the
inherited environment
with
these modifications:
The port named “
current
” on the
p:for-each
is added to the
readable
ports
The port named “
current
” on the
p:for-each
is made the
default readable
port
If the
p:for-each
has a
primary output
port
(explicit or
supplied
by default
) and that port has no
connection
, then it is connected to the
primary
output port
of the
last step
in the
subpipeline
It is a
static
error
err:XS0006
) if the primary output port has no
explicit connection and the
last step
in the subpipeline does not have
a primary output port.
Note that outputs declared for a
p:for-each
serve a dual role. Inside the
p:for-each
, they are used to read
results from the subpipeline. Outside the
p:for-each
, they provide the aggregated
results.
The
sequence
attribute on a
p:output
inside a
p:for-each
only applies
inside the step. From the outside, all of the outputs produce
sequences.
4.2.1 XPath Context
Within a
p:for-each
, the
p:iteration-position
and
p:iteration-size
are taken from the sequence of documents that will be processed by
the
p:for-each
. The total number of documents is
the
p:iteration-size
; the ordinal value of the
current document (the document appearing on the
current
port) is the
p:iteration-position
Note to implementers
In the case where no XPath expression that must be evaluated by
the processor makes any reference to
p:iteration-size
its value does not actually have to be calculated (and the entire
input sequence does not, therefore, need to be buffered so that its
size can be calculated before processing begins).
4.2.2 Example
p:for-each
might accept a sequence of
chapters as its input, process each chapter in turn with XSLT, a
step that accepts only a single input document, and produce a
sequence of formatted chapters as its output.
Example 5. A Sample For-Each
The
//chapter
elements of the document are
selected. Each chapter is transformed into HTML and XSL Formatting
Objects using an XSLT step. The resulting HTML and FO documents are
aggregated together and appear on the
html-results
and
fo-results
ports, respectively, of the
chapters
step
itself.
4.3 p:viewport
A viewport is specified by the
p:viewport
element. It is a
compound step
that
processes a single document, applying its
subpipeline
to one
or more subtrees of the document.
NCName
match
XSLTMatchPattern
((
p:viewport-source
? &
p:output
? &
p:log
?),
subpipeline
The result of the
p:viewport
is a
copy of the original document where the selected subtrees have been
replaced by the results of applying the subpipeline to them.
The
p:viewport-source
is an anonymous input: its
connection
provides a single document to the
p:viewport
step. If no document is explicitly
provided, then the viewport source is read from the
default readable
port
It is a
dynamic error
err:XD0003
) if the
viewport source does not provide exactly one document.
The
match
attribute specifies an
XSLT match pattern. Each matching node in the source document is
wrapped in a document node, as necessary, and provided, one at a
time, to the viewport's
subpipeline
on a port named
current
. The base URI of the resulting document that is
passed to the subpipeline is the base URI of the matched element or
document.
It
is a
dynamic
error
err:XD0010
) if the
match
expression on
p:viewport
does not match an element or
document.
After a match is found, the entire subtree rooted at that match
is processed as a unit. No further attempts are made to match nodes
among the descendants of any matched node.
The environment inherited by the
contained steps
of a
p:viewport
is the
inherited environment
with
these modifications:
The port named “
current
” on the
p:viewport
is added to the
readable
ports
The port named “
current
” on the
p:viewport
is made the
default readable
port
The
p:viewport
must contain a
single,
primary output port
declared
explicitly or
supplied by
default
. If that port has no
connection
, then it is connected to the
primary
output port
of the
last step
in the
subpipeline
It is a
static
error
err:XS0006
) if the primary output port is
unconnected and the
last step
in the subpipeline does not have
a primary output port.
What appears on the output from the
p:viewport
will be a copy of the input document
where each matching node is replaced by the result of applying the
subpipeline to the subtree rooted at that node. In other words, if
the match pattern matches a particular element then that element is
wrapped in a document node and provided on the
current
port, the subpipeline in the
p:viewport
is evaluated, and the result that
appears on the
output
port replaces the
matched element.
If no documents appear on the
output
port,
the matched element will effectively be deleted. If exactly one
document appears, the contents of that document will replace the
matched element. If a sequence of documents appears, then the
contents of each document in that sequence (in the order it appears
in the sequence) will replace the matched element.
The output of the
p:viewport
itself
is a single document that appears on a port named “
result
”. Note that the semantics of
p:viewport
are special. The
output
port in the
p:viewport
is used only to access the results of
the subpipeline. The output of the step itself appears on a port
with the fixed name “
result
” that is never
explicitly declared.
4.3.1 XPath Context
Within a
p:viewport
, the
p:iteration-position
and
p:iteration-size
are taken from the sequence of documents that will be processed by
the
p:viewport
. The total number of documents is
the
p:iteration-size
; the ordinal value of the
current document (the document appearing on the
current
port) is the
p:iteration-position
Note to implementers
In the case where no XPath expression that must be evaluated by
the processor makes any reference to
p:iteration-size
its value does not actually have to be calculated (and the entire
input sequence does not, therefore, need to be buffered so that its
size can be calculated before processing begins).
4.3.2 Example
p:viewport
might accept an XHTML document as
its input, add an
hr
element at the
beginning of all
div
elements that
have the class value “chapter”, and return an XHTML document that
is the same as the original except for that change.
Example 6. A Sample Viewport
The nodes which match
h:div[@class='chapter']
in the input document are
selected. An
hr
is inserted as the first
child of each
h:div
and the resulting version
replaces the original
h:div
. The result of
the whole step is a copy of the input document with a horizontal
rule as the first child of each selected
h:div
4.4 p:choose
A choose is specified by the
p:choose
element. It is a
multi-container
step
that selects exactly one of a list of alternative
subpipelines
based on the evaluation of
XPath expressions.
NCName
p:xpath-context
?,
p:variable
*,
p:when
*,
p:otherwise
?)
p:choose
has no inputs. It
contains an arbitrary number of alternative
subpipelines
exactly one of which will be evaluated.
The list of alternative subpipelines consists of zero or more
subpipelines guarded by an XPath expression, followed optionally by
a single default subpipeline.
The
p:choose
considers each
subpipeline in turn and selects the first (and only the first)
subpipeline for which the guard expression evaluates to true in its
context. If there are no subpipelines for which the expression
evaluates to true, the default subpipeline, if it was specified, is
selected.
After a
subpipeline
is selected, it is evaluated
as if only it had been present.
The outputs of the
p:choose
are
taken from the outputs of the selected
subpipeline
. The
p:choose
has the same number of
outputs as the selected subpipeline with the same names. If the
selected subpipeline has a
primary output port
, the port
with the same name on the
p:choose
is
also a primary output port.
In order to ensure that the output of the
p:choose
is consistent irrespective of the
subpipeline
chosen, each
subpipeline
must
declare the same number of outputs with the same names. If any of
the subpipelines specifies a
primary output port
, each
subpipeline must specify exactly the same output as primary.
It is a
static
error
err:XS0007
) if two
subpipelines
in a
p:choose
declare different
outputs.
As a convenience to authors, it is not an error if some
subpipelines declare outputs that can produce sequences and some do
not. Each output of the
p:choose
is
declared to produce a sequence if that output is declared to
produce a sequence in any of its subpipelines.
It is a
dynamic
error
err:XD0004
) if no
subpipeline
is
selected by the
p:choose
and no
default is provided.
The
p:choose
can specify the
context node against which the XPath expressions that occur on each
branch are evaluated. The context node is specified as a
connection
in the
p:xpath-context
. If no explicit connection
is provided, the
default readable port
is used.
In an XPath 1.0 implementation, if the context node is connected to
p:empty
, or is
unconnected and the
default readable port
is
undefined, an
empty document
node
is used instead as the context. In an XPath 2.0
implementation, the context item is undefined.
Each conditional
subpipeline
is represented by a
p:when
element. The
default branch is represented by a
p:otherwise
element.
4.4.1 p:xpath-context
p:xpath-context
element specifies
the context node against which an XPath expression will be
evaluated. When it appears in a
p:when
, it specifies the context for that
p:when
’s
test
attribute. When it appears in
p:choose
, it
specifies the default context for all of the
p:when
elements in that
p:choose
p:empty
p:pipe
p:document
p:inline
p:data
Only one
connection
is allowed and it works the
same way that connections work on a
p:input
. No
select
expression is allowed.
It is a
dynamic
error
err:XD0005
) if more than one document appears
on the connection for the
xpath-context
The
p:xpath-context
element only
provides the context node. The namespace bindings, in-scope
variables, and other aspects of the context come from the element
on which the XPath expression occurs.
In an XPath 1.0 implementation, if the context node is connected
to
p:empty
, or
is unconnected and the
default readable port
is
undefined, an
empty document
node
is used instead as the context. In an XPath 2.0
implementation, the context item is undefined.
4.4.2 p:when
A when specifies one subpipeline guarded by a test
expression.
XPathExpression
p:xpath-context
?,
p:output
p:log
)*,
subpipeline
Each
p:when
branch of the
p:choose
has a
test
attribute which
must
contain an XPath expression. That XPath
expression's effective boolean value is the guard for the
subpipeline
contained within that
p:when
The
p:when
can specify a context
node against which its
test
expression is to be evaluated. That context node is specified as a
connection
for the
p:xpath-context
. If no context is specified
on the
p:when
, the context of the
p:choose
is
used.
4.4.3 p:otherwise
An otherwise specifies the default branch; the subpipeline
selected if no test expression on any preceding
p:when
evaluates to
true.
((
p:output
p:log
)*,
subpipeline
4.4.4 Example
p:choose
might test the version attribute of the document element and
validate with an appropriate schema.
Example 7. A Sample Choose
4.5 p:group
A group is specified by the
p:group
element. In a
p:try
, it is a non-step wrapper, everywhere
else, it is a
compound step
. A group encapsulates
the behavior of its
subpipeline
NCName
((
p:output
p:log
)*,
subpipeline
p:group
is a convenience wrapper
for a collection of steps.
4.5.1 Example
Example 8. An Example Group
4.6 p:try
A try/catch is specified by the
p:try
element. It is a
multi-container
step
that isolates a
subpipeline
, preventing any dynamic
errors that arise within it from being exposed to the rest of the
pipeline.
NCName
p:variable
*,
p:group
p:catch
The
p:group
represents the initial subpipeline and the recovery (or “catch”)
pipeline is identified with a
p:catch
element.
The
p:try
step evaluates the
initial subpipeline and, if no errors occur, the outputs of that
pipeline are the outputs of the
p:try
step. However, if any errors occur, the
p:try
abandons the first subpipeline, discarding
any output that it might have generated, and evaluates the recovery
subpipeline.
If the recovery subpipeline is evaluated, the outputs of the
recovery subpipeline are the outputs of the
p:try
step. If the recovery subpipeline is
evaluated and a step within that subpipeline fails, the
p:try
fails.
The outputs of the
p:try
are taken
from the outputs of the initial subpipeline or the recovery
subpipeline if an error occurred in the initial subpipeline. The
p:try
has the same number of outputs
as the selected subpipeline with the same names. If the selected
subpipeline has a
primary output port
, the port
with the same name on the
p:try
is
also a primary output port.
In order to ensure that the output of the
p:try
is consistent irrespective of whether the
initial subpipeline provides its output or the recovery subpipeline
does, both subpipelines must declare the same number of outputs
with the same names. If either of the subpipelines specifies a
primary
output port
, both subpipelines must specify exactly the
same output as primary.
It is a
static error
err:XS0009
) if the
p:group
and
p:catch
subpipelines declare different outputs.
As a convenience to authors, it is not an error if an output
port can produce a sequence in the initial subpipeline but not in
the recovery subpipeline, or vice versa. Each output of the
p:try
is declared to produce a
sequence if that output is declared to produce a sequence in either
of its subpipelines.
A pipeline author can cause an error to occur with the
p:error
step.
The recovery subpipeline of a
p:try
is identified with a
p:catch
NCName
((
p:output
p:log
)*,
subpipeline
The environment inherited by the
contained steps
of the
p:catch
is the
inherited
environment
with this modification:
The port named “
error
” on the
p:catch
is added to the
readable
ports
What appears on the
error
output port is
an
error document
. The error document may
contain messages generated by steps that were part of the initial
subpipeline. Not all messages that appear are indicative of errors;
for example, it is common for all
xsl:message
output from the XSLT component to
appear on the
error
output port. It is
possible that the component which fails may not produce any
messages at all. It is also possible that the failure of one
component may cause others to fail so that there may be multiple
failure messages in the document.
4.6.1 The Error
Vocabulary
In general, it is very difficult to predict error behavior. Step
failure may be catastrophic (programmer error), or it may be be the
result of user error, resource failures, etc. Steps may detect more
than one error, and the failure of one step may cause other steps
to fail as well.
The
p:try
p:catch
mechanism gives pipeline authors the
opportunity to process the errors that caused the
p:try
to fail. In order
to facilitate some modicum of interoperability among processors,
errors that are reported on the
error
output port of a
p:catch
should
conform to the format described here.
4.6.1.1 c:errors
The error vocabulary consists of a root element,
c:errors
which contains zero or more
c:error
elements.
c:error
4.6.1.2 c:error
Each specific error is represented by an
c:error
element:
NCName
type? =
QName
code? =
QName
href? =
anyURI
line? =
integer
column? =
integer
offset? =
integer
string
anyElement
)*
The
name
and
type
attributes identify the name and type,
respectively, of the step which failed.
The
code
is a QName which
identifies the error. For steps which have defined error codes,
this is an opportunity for the step to identify the error in a
machine-processable fashion. Many steps omit this because they do
not include the concept of errors identified by QNames.
If the error was caused by a specific document, or by the
location of some erroneous construction in a specific document, the
href
line
column
and
offset
attributes identify this
location. Generally, the error location is identified either with
line and column numbers or with an offset from the beginning of the
document, but not usually both.
The content of the
c:error
element
is any well-formed XML. Specific steps, or specific
implementations, may provide more detail about the format of the
content of an error message.
4.6.1.3 Error Example
Consider the following XSLT stylesheet:
If it was used in a step named “xform” in a
p:try
, the following
error document might be produced:
It is not an error for steps to generate non-standard error
output as long as it is well-formed.
4.6.2 Example
A pipeline might attempt to process a document by dispatching it
to some web service. If the web service succeeds, then those
results are passed to the rest of the pipeline. However, if the web
service cannot be contacted or reports an error, the
p:catch
step can
provide some sort of default for the rest of the pipeline.
Example 9. An Example Try/Catch
4.7 Atomic Steps
In addition to the six step types described in the preceding
sections, XProc provides a standard library of atomic step types.
The full vocabulary of standards steps is described in
Section 7,
“Standard Step Library”
All of the standard, atomic steps are invoked in the same
way:
p:atomic-step
name? =
NCName
p:input
p:with-option
p:with-param
p:log
)*
p:atomic-step
Where “
p:atomic-step
must
be in the XProc namespace and
must
be declared in either the standard library
for the XProc version supported by the processor or explicitly
imported by the surrounding pipeline (see
Section 2.13, “Versioning
Considerations”
).
4.8 Extension
Steps
Pipeline authors may also have access to additional steps not
defined or described by this specification. Atomic extension steps
are invoked just like standard steps:
pfx:atomic-step
name? =
NCName
p:input
p:with-option
p:with-param
p:log
)*
pfx:atomic-step
Extension steps
must not
be in the
XProc namespace and there
must
be a
visible
step
declaration at the point of use (see
Section 3.2, “Scoping of Names”
).
If the relevant step declaration has no
subpipeline
, then
that step invokes the declared atomic step, which the processor
must know how to perform. These steps are implementation-defined
extensions.
If the relevant step declaration has a
subpipeline
, then
that step runs the declared subpipeline. These steps are user- or
implementation-defined extensions. Pipelines can refer to
themselves (recursion is allowed), to pipelines defined in imported
libraries, and to other pipelines in the same library if they are
in a library.
It is a
static
error
err:XS0010
) if a pipeline contains a step
whose specified inputs, outputs, and options do not
match
the
signature
for steps of
that type.
It is a
dynamic
error
err:XD0017
) if the running pipeline attempts
to invoke a step which the processor does not know how to
perform.
The presence of other
compound
steps
is
implementation-defined
; XProc
provides no standard mechanism for defining them or describing what
they can contain.
It is a
static error
err:XS0048
) to use a
declared step as a
compound step
4.8.1 Syntactic Shortcut for Option
Values
Namespace qualified attributes on a step are
extension
attributes
. Attributes, other than
name
, that are not namespace qualified are
treated as a syntactic shortcut for specifying the value of an
option. In other words, the following two steps are equivalent:
The first step uses the standard
p:with-option
syntax:
The second step uses the syntactic shortcut:
Note that there are significant limitations to this shortcut
syntax:
It only applies to option names that are not in a namespace.
It only applies to option names that are not otherwise used on
the step, such as “
name
”.
It can only be used to specify a constant value. Options that
are computed at runtime must be written using the longer form.
It is a
static
error
err:XS0027
) if an option is specified with
both the shortcut form and the long form.
It is a
static
error
err:XS0031
) to use an option on an
atomic step
that is
not declared on steps of that type.
The syntactic shortcuts apply equally to standard atomic steps
and extension atomic steps.
5 Other
pipeline elements
5.1 p:input
p:input
identifies an input port
for a step. In some contexts,
p:input
declares that a port with the specified name exists and identifies
the properties of that port. In other contexts, it provides a
connection for a port declared elsewhere. And in some contexts, it
does both. The semantics of
p:input
are complicated further by the fact that there are two kinds of
inputs, ordinary “document” inputs and “parameter” inputs.
5.1.1 Document Inputs
The declaration of a document input identifies the name of the
port, whether or not the port accepts a sequence, whether or not
the port is a
primary input port
, and may
provide a default connection for the port.
An input
declaration
has the following form:
NCName
sequence? =
boolean
primary? =
boolean
kind? = "document"
select? =
XPathExpression
p:empty
p:document
p:inline
p:data
)+)?
The
port
attribute defines the
name of the port.
It is a
static error
err:XS0011
) to
identify two ports with the same name on the same step.
The
sequence
attribute determines
whether or not a sequence of documents is allowed on the port.
If
sequence
is not specified, or has the value
false, then it is a
dynamic error
err:XD0006
) unless
exactly one document appears on the declared port.
The
primary
attribute is used to
identify the
primary input port
. An input port
is a
primary
input port
if
primary
is
specified with the value
true
or if the
step has only a single input port and
primary
is not specified.
It is a
static
error
err:XS0030
) to specify that more than one
input port is the primary.
The
kind
attribute distinguishes
between the two kinds of inputs: document inputs and parameter
inputs. An input port is a document input port if
kind
is specified with the value “
document
” or if
kind
is not specified.
If a connection is provided in the declaration, then
select
may be used to select a portion of the
input identified by the
p:empty
p:document
p:data
, or
p:inline
elements in the
p:input
. This select
expression applies
only
if the default connection is used.
If an explicit connection is provided by the caller, then the
default select expression is ignored.
Note
The
p:pipe
element is explicitly excluded from a declaration because it would
make the default value of an input dependent on the execution of
some part of the pipeline. Default values are designed so that they
can be computed statically.
On a
p:declare-step
for an atomic step, the
p:input
simply
declares the input port.
It is a
static error
err:XS0042
) to
attempt to provide a connection for an input port on the
declaration of an atomic step.
An input
connection
has the following form:
NCName
select? =
XPathExpression
p:empty
p:pipe
p:document
p:inline
p:data
)+)?
If no connection is provided for a
primary input
port
, the input will be connected to the
default readable
port
It is a
static error
err:XS0032
) if no
connection is provided and the
default readable port
is
undefined.
select
expression
may
also be provided with a connection. The
select
expression, if specified,
applies the specified XPath select expression to the document(s)
that are read. Each selected node is wrapped in a document (unless
it is a document) and provided to the input port. In other
words,
provides a single document, but
provides a sequence of zero or more documents, one for each
html:div
in
. (Note that in the case of
nested
html:div
elements, this may result in
the same content being returned in several documents.)
A select expression can equally be applied to input read from
another step. This input:
provides a sequence of zero or more documents, one for each
html:div
in the document (or each of the
documents) that is read from the
result
port of the step named
origin
The base URI of the document that results from a select
expression is the base URI of the matched element or document.
It is a
dynamic
error
err:XD0016
) if the
select
expression on a
p:input
returns atomic
values or anything other than element or document nodes (or an
empty sequence).
An input declaration may include a default connection. If no
connection is provided for an input port which has a default
connection, then the input is treated as if the default connection
appeared.
A default connection does not satisfy the requirement that a
primary input port is automatically connected by the processor, nor
is it used when no default readable port is defined. In other
words, a
p:declare-step
or a
p:pipeline
can
define defaults for all of its inputs, whether they are primary or
not, but defining a default for a primary input usually has no
effect. It's never used by an atomic step since the step, when it's
called, will always connect the primary input port to the default
readable port (or cause a static error). The only case where it has
value is on a
p:pipeline
when that pipeline is invoked
directly by the processor. In that case, the processor
must
use the default connection if no external
connection is provided for the port.
5.1.2 Parameter Inputs
The declaration of a parameter input identifies the name of the
port and that the port is a parameter input.
NCName
sequence? =
boolean
primary? =
boolean
kind
= "parameter">
p:empty
p:document
p:inline
)+)?
The
port
attribute defines the
name of the port.
It is a
static error
err:XS0011
) to
identify two ports with the same name on the same step.
The
sequence
attribute determines
whether or not a sequence of documents is allowed on the port. A
sequence of documents is always allowed on a parameter input port.
It is a
static
error
err:XS0040
) to specify any value other than
true
The
primary
attribute is used to
identify the
primary parameter input
port
. An input port is a
primary parameter input
port
if it is a
parameter input port
and
primary
is specified with the value
true
or if the step has only a single
parameter input port and
primary
is
not specified.
It is a
static error
err:XS0030
) to
specify that more than one parameter input port is the primary.
The
kind
attribute distinguishes
between the two kinds of inputs: document inputs and parameter
inputs. An input port is a parameter input port only if the
kind
attribute is specified with the
value “parameter”.
It is a
static error
err:XS0033
) to
specify any kind of input other than “document” or “parameter”.
A parameter input port is a distinguished kind of input port. It
exists only to receive computed parameters; if a step does not have
a parameter input port then it cannot receive parameters. A
parameter input port must satisfy all the constraints of a normal,
document input port.
It is a
static
error
err:XS0035
) if the declaration of a parameter
input port contains a connection; parameter input port declarations
must be empty.
When used on a step, parameter input ports are connected just
like ordinary document ports. Parameter input ports always accept a
sequence of documents. If no explicit connection is provided for a
primary parameter input
port
, then the port will be connected to the primary
parameter input port of the pipeline which contains the step. If no
connection is provided for a parameter input port other than the
primary parameter input port, then the port will be connected to an
empty
sequence
of documents.
It is a
static
error
err:XS0055
) if a primary parameter input port
is unconnected and the pipeline that contains the step has no
primary parameter input port unless at least one explicit
p:with-param
is
provided for that port.
In other words, it is an error to leave a parameter input port
unconnected, but any parameter passed explicitly to that port
satisfies the connection requirement.
This is an error:
The parameter input port on the
p:xslt
step has no connection and the
pipeline that contains the step has no primary parameter input
port.
This is not an error:
Explicitly setting the “
mode
” parameter
satisfies the binding requirement. No other parameters can be
passed to the
p:xslt
step.
This, also, is not an error:
The parameter input port on the
p:xslt
step is bound to the primary
parameter input port of the pipeline. The
p:xslt
step will receive
the “
mode
” parameter and any other
parameters passed to the pipeline.
If a parameter input port on a
p:pipeline
is not connected, it is treated
as if it was connected to an automatically created
p:sink
step. In other
words, if a
p:pipeline
does not contain any steps that
have parameter input ports, or if those ports are all explicitly
connected elsewhere, the parameter input port is ignored. In this
one case, it is not an error for an input port to be
unconnected.
A step which accepts a parameter input reads all of the
documents presented on that port, using each
c:param
(either at the
root or inside the
c:param-set
) to establish the value of the
named parameter. If the same name appears more than once, the last
value specified is used. If the step also has literal
p:with-param
elements, they are also considered in document order. In other
words,
p:with-param
elements that appear before the
parameter input may be overridden by the computed parameters;
p:with-param
elements that appear after may
override the computed values.
If a connection is manufactured for a primary parameter input
port, that connection occurs logically last among the other
parameters, options, and connections passed to the step. In other
words, the parameter values that appear on that port will be used
even if other values were specified with
p:with-param
elements. Users can change this priority by making the connection
explicit and placing any
p:with-param
elements that they wish to
function as overrides after the connection.
All of the documents that appear on a parameter input must
either be
c:param
documents or
c:param-set
documents.
Consider the example in
Example 10, “A Parameter
Example”
Example 10. A Parameter Example
name="main">
This
p:pipeline
declares that it accepts
parameters. Suppose that (through some
implementation-defined
mechanism) I have passed the parameters “
output-type
fo
” and
profile
unclassified
” to the pipeline. These parameters are
available on the
parameters
input
port.
When the XSLT step runs, it will read those parameters and
combine them with any parameters specified literally on the step.
Because the parameter input comes
after
the literal
declaration for
output-type
on the step,
the XSLT stylesheet will see both values that I passed in
(“
output-type
fo
and “
profile
unclassified
”).
If the parameter input came
before
the literal
declaration, then the XSLT stylesheet would see “
output-type
html
” and
profile
unclassified
”.
Most steps don't bother to declare parameter inputs, or provide
explicit connections for them, and “the right thing” usually
happens.
5.1.2.1 The c:param
element
c:param
represents a parameter on
a parameter input.
QName
namespace? =
anyURI
value
string
/>
The
name
attribute of the
c:param
must have the lexical form of
a QName.
If the
namespace
attribute is
specified, then the expanded name of the parameter is constructed
from the specified namespace and the
name
value.
It is a
dynamic error
err:XD0025
) if the
namespace
attribute is specified,
the
name
contains a colon, and the
specified namespace is not the same as the in-scope namespace
binding for the specified prefix.
If the
namespace
attribute is not
specified, and the
name
contains a
colon, then the expanded name of the parameter is constructed using
the
name
value and the namespace
declarations in-scope on the
c:param
element.
If the
namespace
attribute is not
specified, and the
name
does not
contain a colon, then the expanded name of the parameter is in no
namespace.
Any namespace-qualified attribute names that appear on the
c:param
element are ignored.
It is a
dynamic
error
err:XD0014
) for any unqualified attribute
names other than “
name
”, “
namespace
”, or “
value
” to
appear on a
c:param
element.
5.1.2.2 The
c:param-set element
c:param-set
represents a set of
parameters on a parameter input.
c:param
The
c:param-set
contains zero or
more
c:param
elements.
It
is a
dynamic
error
err:XD0018
) if the parameter list contains
any elements other than
c:param
Any namespace-qualified attribute names that appear on the
c:param-set
element are ignored.
It is a
dynamic
error
err:XD0014
) for any unqualified attribute
names to appear on a
c:param-set
element.
5.2 p:iteration-source
p:iteration-source
identifies
input to a
p:for-each
XPathExpression
p:empty
p:pipe
p:document
p:inline
p:data
)+)?
The
select
attribute and
connection
of a
p:iteration-source
work the same
way that they do in a
p:input
5.3 p:viewport-source
p:viewport-source
identifies
input to a
p:viewport
p:pipe
p:document
p:inline
p:data
)?
Only one
connection
is allowed and it works the
same way that connections work on a
p:input
It is a
dynamic error
err:XD0003
) unless
exactly one document appears on the
p:viewport-source
. No
select
expression is allowed.
5.4 p:output
p:output
identifies an output
port, optionally connecting an input for it, if necessary.
NCName
sequence? =
boolean
primary? =
boolean
/>
The
port
attribute defines the
name of the port.
It is a
static error
err:XS0011
) to
identify two ports with the same name on the same step.
An output declaration can indicate if a sequence of documents is
allowed to appear on the declared port. If
sequence
is specified with the value
true
, then a sequence is allowed.
If
sequence
is not specified on
p:output
, or has the value false, then it is a
dynamic
error
err:XD0007
) if the step does not produce
exactly one document on the declared port.
The
primary
attribute is used to
identify the primary output port. An output port is a primary
output port if
primary
is specified
with the value
true
or if the step has
only a single output port and primary is not specified.
It is a
static
error
err:XS0014
) to identify more than one output
port as primary.
On
compound
steps
, the declaration
may
be
accompanied by a
connection
for the output.
NCName
sequence? =
boolean
primary? =
boolean
p:empty
p:pipe
p:document
p:inline
p:data
)+)?
It is a
static
error
err:XS0029
) to specify a connection for a
p:output
inside a
p:declare-step
for an atomic step.
If a connection is provided for a
p:output
, documents are
read from
that
connection and those documents form the output that
is
written
to the output port. In other words, placing a
p:document
inside a
p:output
causes the processor to
read that document
and provide it on the output port. It
does not
cause the processor to
write
the output
to that document.
5.5 p:log
p:log
element is a debugging aid.
It associates a URI with a specific output port on a step:
NCName
href? =
anyURI
/>
The semantics of
p:log
are that it
writes to the specified IRI whatever document or documents appear
on the specified port.
If the
href
attribute is not specified, the
location of the log file or files is
implementation-defined
How each document or sequence
of documents is represented in a
p:log
is
implementation-defined
Pipelines are not expected to be able to consume their own logging
output.
The ability of a step to
read the
p:log
output of some former
step is
implementation-dependent
It is a
static
error
err:XS0026
) if the port specified on the
p:log
is not the name of an output
port on the step in which it appears or if more than one
p:log
element is applied to the same port.
Implementations may, at user option, ignore all
p:log
elements.
Note
This element represents a potential security risk: running
unexamined 3rd-party pipelines could result in vital system
resources being overwritten.
5.6 p:serialization
The
p:serialization
element allows
the user to request serialization properties on a
p:pipeline
output.
NCName
byte-order-mark? =
boolean
cdata-section-elements? =
NMTOKENS
doctype-public? =
string
doctype-system? =
string
encoding? =
string
escape-uri-attributes? =
boolean
include-content-type? =
boolean
indent? =
boolean
media-type? =
string
method? =
QName
normalization-form? =
NFC|NFD|NFKC|NFKD|fully-normalized|none|xs:NMTOKEN
omit-xml-declaration? =
boolean
standalone? =
true|false|omit
undeclare-prefixes? =
boolean
version? =
string
/>
If the pipeline processor serializes the output on the specified
port, it
must
use the serialization
options specified. If the processor is not serializing (if, for
example, the pipeline has been called from another pipeline), then
the
p:serialization
must
be ignored. The processor
may
reject statically a pipeline that requests
serialization options that it cannot provide.
The default value of any
serialization options not specified on a particular
p:serialization
element is
implementation-defined
. The
allowed options are defined by [
Serialization
].
It is a
dynamic error
err:XD0020
) if the
combination of serialization options specified or defaulted is not
allowed. Implementations
must
check
that all of the specified serialization options are allowed if they
serialize the specified output. If the specified output is not
being serialized (because it is being returned as the result of a
call from within another pipeline, for example) implementations
may
but are not required to check that
the specified options are allowed.
The semantics of the attributes on a
p:serialization
are described in
Section 7.3, “Serialization
Options”
It is a
static
error
err:XS0039
) if the port specified on the
p:serialization
is not the name of an
output port on the pipeline in which it appears or if more than one
p:serialization
element is applied to
the same port.
5.7 Variables, Options, and
Parameters
Variables, options, and parameters provide a mechanism for
pipeline authors to construct temporary results and hold onto them
for reuse.
Variables are created in compound steps and, like XSLT
variables, are single assignment, though they may be shadowed by
subsequent declarations of other variables with the same name.
Options can be declared on atomic or compound steps. The value
of an option can be specified by the caller invoking the step. Any
value specified by the caller takes precedence over any default
value specified in the declaration.
Parameters, unlike options and variables, have names that can be
computed at runtime. The most common use of parameters is to pass
parameter values to XSLT stylesheets.
5.7.1 p:variable
p:variable
declares a variable
and associates a value with it.
The name of the variable
must
be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static
error
err:XS0028
) to declare an option or variable
in the XProc namespace.
The variable's value is specified with a
select
attribute. The
select
attribute
must
be specified. The content of the
select
attribute is an XPath expression which
will be evaluated to provide the value of the variable.
QName
select
XPathExpression
((
p:empty
p:pipe
p:document
p:inline
p:data
)?
p:namespaces
*)
If a
select
expression is given,
it is evaluated as an XPath expression using the appropriate
context as described in
Section 2.6, “XPaths in XProc”
, for the
enclosing
container
, with the addition of bindings
for all preceding-sibling
p:variable
and
p:option
elements. Regardless of the implicit type of the expression, when
XPath 1.0 is being used, the string value of the expression becomes
the value of the variable; when XPath 2.0 is being used, the type
is treated as an
xs:untypedAtomic
Since all
in-scope bindings
are present in
the Processor XPath Context as variable bindings,
select
expressions may refer to the value of
in-scope
bindings
by variable reference. If a variable reference
uses a QName that is not the name of an
in-scope
binding
, an XPath evaluation error will occur.
If a
select
expression is given,
the
readable
ports
available for document connections are the
readable
ports
in the environment inherited by the first step in
the surrounding
container
's
contained steps
. However, in order
to avoid ordering paradoxes,
it is a
static error
err:XS0019
) for a
variable's document connection to refer to the output port of any
step in the surrounding
container
's
contained steps
If a
select
expression is given
but no document connection is provided, the implicit connection is
to the
default readable port
in the
environment inherited by the first step in the surrounding
container
's
contained
steps
. If there is no default readable port, the
connection is treated as if
p:empty
was specified.
It is a
dynamic
error
err:XD0008
) if a sequence of more than one
document appears on the connection for a
p:variable
. In an XPath 1.0 implementation, if
p:empty
is
given or implied as the document connection, an
empty document node
is used as the
context node. In an XPath 2.0 implementation, the context item is
undefined.
It
is a
dynamic
error
err:XD0026
) if the
select
expression makes reference to the
context node, size, or position when the context item is
undefined.
5.7.2 p:option
p:option
declares an option and
may associate a default value with it. The
p:option
tag can only be used in a
p:declare-step
or a
p:pipeline
(which is a syntactic
abbreviation for a step declaration).
The name of the option
must
be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static
error
err:XS0028
) to declare an option or variable
in the XProc namespace.
It is a
static
error
err:XS0004
) to declare two or more options on
the same step with the same name.
QName
required? =
boolean
/>
An option may be declared as
required
If an option is required, it is a
static
error
err:XS0018
) to invoke the step without
specifying a value for that option.
If an option is not declared to be required, it
may
be given a default value. The value is
specified with a
select
attribute.
QName
required? =
boolean
select
XPathExpression
/>
If a
select
attribute is
specified, its content is an XPath expression which will be
evaluated to provide the value of the option, which may differ from
one instance of the step type to another.
The
select
expression is only
evaluated when its actual value is needed by an instance of the
step type being declared. In this case, it is evaluated as
described in
Section 5.7.3, “p:with-option”
except
that
In an XPath 1.0 implementation, an
empty document node
is used as the
context. In an XPath 2.0 implementation, the context item is
undefined.
the variable bindings consist only of bindings for options whose
declaration precedes the
p:option
itself in the surrounding step
signature
the in-scope namespaces are the in-scope namespaces of the
p:option
itself.
It is a
static
error
err:XS0017
) to specify that an option is both
required
and
has a default
value.
It is a
dynamic
error
err:XD0026
) if the
select
expression makes reference to the
context node, size, or position.
Regardless of the implicit type of the expression, when XPath
1.0 is being used, the string value of the expression becomes the
value of the option; when XPath 2.0 is being used, the value is an
xs:untypedAtomic
5.7.3 p:with-option
p:with-option
provides an actual
value for an option when a step is invoked.
The name of the option
must
be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
static
error
err:XS0031
) to use an option name in
p:with-option
if the step type being
invoked has not declared an option with that name. (This error does
not apply for steps in the XProc namespace when the processor is
operating in
forwards-compatible
mode
.)
It is a
static
error
err:XS0004
) to include more than one
p:with-option
with the same option
name as part of the same step invocation.
The actual value is specified with a
select
attribute. The
select
attribute
must
be specified. The value of the
select
attribute is an XPath expression which
will be evaluated to provide the value of the variable.
QName
select
XPathExpression
((
p:empty
p:pipe
p:document
p:inline
p:data
)?
p:namespaces
*)
Regardless of the implicit type of the expression, when XPath
1.0 is being used, the string value of the expression becomes the
value of the option; when XPath 2.0 is being used, the value is an
xs:untypedAtomic
All
in-scope bindings
for the step
instance itself are present in the Processor XPath Context as
variable bindings, so
select
expressions may refer to any option or variable bound in those
in-scope
bindings
by variable reference. If a variable reference
uses a QName that is not the name of an
in-scope
binding
or preceding sibling option, an XPath evaluation
error will occur.
If a
select
expression is used
but no document connection is provided, the implicit connection is
to the
default readable port
. If
there is no default readable port, the connection is treated as if
p:empty
was
specified.
It is a
dynamic
error
err:XD0008
) if a sequence of more than one
document appears on the connection for a
p:with-option
. In an XPath 1.0 implementation,
if
p:empty
is
given or implied as the document connection, an
empty document node
is used as the
context node. In an XPath 2.0 implementation, the context item is
undefined.
It is a
dynamic error
err:XD0026
) if the
select
expression makes reference to
the context node, size, or position when the context item is
undefined.
5.7.4 p:with-param
The
p:with-param
element is used to
establish the value of a parameter. The parameter
must
be given a value when it is used. (Parameter
names aren't known in advance; there's no provision for declaring
them.)
The name of the parameter
must
be a
QName. If it does not contain a prefix then it is in no namespace.
It is a
dynamic
error
err:XD0031
) to use the XProc namespace in the
name of a parameter.
The value is specified with a
select
attribute. The
select
attribute
must
be specified. The content of the
select
attribute is an XPath expression which
will be evaluated to provide the value of the variable.
QName
select
XPathExpression
port? =
NCName
((
p:empty
p:pipe
p:document
p:inline
p:data
)?
p:namespaces
*)
The values of parameters for a step
must
be computed after all the options in the
step's
signature
have had their values computed.
If a
select
expression is given on a
p:with-param
, it is evaluated as an
XPath expression using the appropriate context as described in
Section 2.6,
“XPaths in XProc”
, for the containing step, with the addition
of variable bindings for all options declared in the containing
step's
signature
Regardless of the implicit type of the expression, when XPath
1.0 is being used, the string value of the expression becomes the
value of the parameter; when XPath 2.0 is being used, the value is
an
xs:untypedAtomic
All
in-scope bindings
for the step
instance itself are present in the Processor XPath Context as
variable bindings, so
select
expressions may refer to any option or variable bound in those
in-scope
bindings
, as well as to any option declared in the step
signature, by variable reference. If a variable reference uses a
QName that is not the name of an
in-scope binding
or declared
option, an XPath evaluation error will occur.
If a
select
expression is used
but no document connection is provided, the implicit connection is
to the
default readable port
. If
there is no default readable port, the connection is treated as if
p:empty
was
specified.
It is a
dynamic
error
err:XD0008
) if a sequence of more than one
document appears on the connection for a
p:with-param
. In an XPath 1.0 implementation, if
p:empty
is
given or implied as the document connection, an
empty document node
is used as the
context node. In an XPath 2.0 implementation, the context item is
undefined.
It is a
dynamic error
err:XD0026
) if the
select
expression makes reference to
the context node, size, or position when the context item is
undefined.
If the optional
port
attribute is
specified, then the parameter appears on the named port, otherwise
the parameter appears on the step's
primary parameter input
port
It is a
static error
err:XS0034
) if the
specified port is not a parameter input port or if no port is
specified and the step does not have a primary parameter input
port.
5.7.5 Namespaces on variables,
options, and parameters
Variable, option and parameter values carry with them not only
their literal or computed string value but also a set of
namespaces. To see why this is necessary, consider the following
step:
The
p:delete
step will delete elements that
match the expression “
html:div
”, but that
expression can only be correctly interpreted if there's a namespace
binding for the prefix “
html
” so that
binding has to travel with the option.
The default namespace bindings associated with a variable,
option or parameter value are computed as follows:
If the
select
attribute was used
to specify the value and it consisted of a single
VariableReference
(per [
XPath 1.0
] or [
XPath 2.0
], as
appropriate), then the namespace bindings from the referenced
option or variable are used.
If the
select
attribute was used
to specify the value and it evaluated to a node-set, then the
in-scope namespaces from the first node in the selected node-set
(or, if it's not an element, its parent) are used.
The expression is evaluated in the appropriate context, See
Section 2.6,
“XPaths in XProc”
Otherwise, the in-scope namespaces from the element providing
the value are used. (For options specified using
syntactic shortcuts
, the step element itself
is providing the value.)
The default namespace is never included in the namespace
bindings for a variable, option or parameter value. Unqualified
names are always in no-namespace.
Unfortunately, in more complex situations, there may be no
single variable, option or parameter that can reliably be expected
to have the correct set of namespace bindings. Consider this
pipeline:
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:ex="http://example.org/ns/ex"
xmlns:h="http://www.w3.org/1999/xhtml">
It defines an atomic step (“
ex:delete-in-div
”) that deletes elements that occur
inside of XHTML div elements. It might be used as follows:
xmlns:html="http://www.w3.org/1999/xhtml"
divchild="html:p[@class='delete']"/>
In this case, the
match
option passed
to the
p:delete
step needs
both
the
namespace binding of “
” specified in the
ex:delete-in-div
pipeline definition
and
the namespace binding of “
html
” specified in the
divchild
option on the call of that pipeline. It's
not sufficient to provide just one of the sets of bindings.
The
p:namespaces
element can be used as a child
of
p:variable
p:with-option
or
p:with-param
to provide explicit
bindings.
QName
element? =
XPathExpression
except-prefixes? =
prefix
list
/>
The namespace bindings specified by a
p:namespaces
element are determined as follows:
If the
binding
attribute is
specified, it
must
contain the name of
a single
in-scope binding
. The namespace
bindings associated with that binding are used.
It is a
static
error
err:XS0020
) if the
binding
attribute on
p:namespaces
is
specified and its value is not the name of an
in-scope
binding
If the
element
attribute is
specified, it
must
contain an XPath
expression which identifies a single element node (the input
connection for this expression is the same as the connection for
the
p:option
or
p:with-param
which contains it). The
in-scope namespaces of that node are used.
The expression is evaluated in the appropriate context, See
Section 2.6,
“XPaths in XProc”
It is a
dynamic
error
err:XD0009
) if the
element
attribute on
p:namespaces
is
specified and it does not identify a single element node.
If neither
binding
nor
element
is specified, the in-scope namespaces
on the
p:namespaces
element itself are used.
Irrespective of how the set of namespaces are determined, the
except-prefixes
attribute can be
used to exclude one or more namespaces. The value of the
except-prefixes attribute
must
be a
sequence of tokens, each of which
must
be a prefix bound to a namespace in the in-scope namespaces of the
p:namespaces
element. All bindings of
prefixes to each of the namespaces thus identified are excluded.
It is a
static
error
err:XS0051
) if the
except-prefixes
attribute on
p:namespaces
does
not contain a list of tokens or if any of those tokens is not a
prefix bound to a namespace in the in-scope namespaces of the
p:namespaces
element.
It is a
static
error
err:XS0041
) to specify both
binding
and
element
on the same
p:namespaces
element.
If a
p:variable
p:with-option
or
p:with-param
includes one or more
p:namespaces
elements, then the union of all the namespaces specified on those
elements are used as the bindings for the variable, option or
parameter value. In this case, the in-scope namespaces on the
p:variable
p:with-option
or
p:with-param
are
ignored.
It is
dynamic
error
err:XD0013
) if the specified namespace
bindings are inconsistent; that is, if the same prefix is bound to
two different namespace names.
For example, this would allow the preceding example to work:
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:ex="http://example.org/ns/ex"
xmlns:h="http://www.w3.org/1999/xhtml">
The
p:namespaces
element provides namespace
bindings for both of the prefixes necessary to correctly interpret
the expression ultimately passed to the
p:delete
step (the
binding for
html:
is explicitly provided
and the binding for
h:
is in-scope).
Note
The use of
p:namespaces
here, when all of the bindings
are provided with explicit namespace declarations, is unnecessary.
The bindings could simply be placed on the parent
p:with-option
element. We use
p:namespaces
here only to make the example
parallel to the one which follows.
The preceding solution has the weakness that it depends on
knowing the bindings that will be used by the caller. A more
flexible solution would use the
binding
attribute to copy the bindings from
the caller's option value.
xmlns:p="http://www.w3.org/ns/xproc"
xmlns:ex="http://example.org/ns/ex"
xmlns:h="http://www.w3.org/1999/xhtml">
This example will succeed as long as the caller-specified option
does not bind the “
” prefix to something
other than the XHTML namespace.
5.8 p:declare-step
p:declare-step
provides the type
and
signature
of an
atomic step
or
pipeline. It declares the inputs, outputs, and options for all
steps of that type.
NCName
type? =
QName
psvi-required? =
boolean
xpath-version? =
string
exclude-inline-prefixes? =
prefix list
version? =
string
p:input
p:output
p:option
p:log
p:serialization
)*,
((
p:declare-step
p:pipeline
p:import
)*,
subpipeline
)?
The value of the
type
can be from
any namespace provided that the expanded-QName of the value has a
non-null namespace URI.
It is a
static error
err:XS0025
) if the
expanded-QName value of the
type
attribute is in no namespace or in the XProc namespace. Except as
described in
Section 2.13, “Versioning
Considerations”
, the XProc namespace
must
not
be used in the type of steps. Neither users nor
implementers may define additional steps in the XProc
namespace.
Irrespective of the context in which the
p:declare-step
occurs, there are initially no
option or variable names in-scope inside a
p:declare-step
. That is,
p:option
and
p:variable
elements
can refer to values declared by their preceding siblings, but not
by any of their ancestors.
When a declared step is
evaluated directly by the XProc processor (as opposed to occurring
as an atomic step in some
container
), how the input and output ports
are connected to documents is
implementation-defined
A step declaration is not a
step
in
its own right. Sibling steps cannot refer to the inputs or outputs
of a
p:declare-step
using
p:pipe
; only instances
of the type can be referenced.
The
version
attribute identifies
the version of XProc for which this step declaration was authored.
If the
p:declare-step
has no ancestors
in the XProc namespace, then it
must
have a
version
attribute. See
Section 2.13, “Versioning
Considerations”
For a description of
psvi-required
, see
Section 2.8, “PSVIs in XProc”
. For
xpath-version
, see
Section 2.6, “XPaths
in XProc”
. For
exclude-inline-prefixes
, see
p:inline
5.8.1 Declaring atomic steps
When declaring an atomic step, the subpipeline in the
declaration
must
be empty. And,
conversely, if the subpipeline in a declaration is empty, the
declaration
must
be for an atomic
step.
Implementations may use
extension
attributes
to provide
implementation-dependent
information about a declared step. For example, such an attribute
might identify the code which implements steps of this type.
It is not an error for a pipeline to include declarations for
steps that a particular processor does not know how to implement.
It is, of course, an error to attempt to evaluate such steps.
If
p:log
or
p:serialization
elements appear in the
declaration of an atomic step, they will only be used if the atomic
step is directly evaluated by the processor. They have no effect if
the step appears in a
subpipeline
; only the serialization
options of the “top level” step or pipeline are used because that
is the only step which the processor is required to serialize.
5.8.2 Declaring pipelines
When a
p:declare-step
declares a pipeline, that
pipeline encapsulates the behavior of the specified
subpipeline
. Its
children declare inputs, outputs, and options that the pipeline
exposes and identify the steps in its subpipeline.
The subpipeline may include
declarations of additional steps (e.g., other pipelines or other
step types that are provided by a particular implementation or in
some
implementation-defined
way)
and import other pipelines. If a pipeline has been imported, it may
be invoked as a step within the subpipeline that imported it.
The environment inherited by the
subpipeline
is the
empty
environment
with these modifications:
All of the declared inputs are added to the
readable ports
in
the environment.
If a
primary input port
is declared,
that port is the
default readable port
otherwise the default readable port is undefined.
If a
primary output port
is declared
and that port has no
connection
, then it is connected to the
primary
output port
of the
last step
in the
subpipeline
It is a
static
error
err:XS0006
) if the primary output port is
unconnected and the
last step
in the subpipeline does not have
a primary output port.
The requested
xpath-version
must
be used to evaluate XPath
expressions subject to the constraints outlined in
Section 2.6, “XPaths
in XProc”
The
psvi-required
attribute
allows the author to declare that a step relies on the processor's
ability to pass PSVI annotations between steps, see
Section 2.8, “PSVIs in
XProc”
. If the attribute is not specified, the value
false
” is assumed.
5.9 p:library
p:library
is a collection of step
declarations and/or pipeline definitions.
boolean
xpath-version? =
string
exclude-inline-prefixes? =
prefix list
version? =
string
p:import
p:declare-step
p:pipeline
)*
The
version
attribute identifies
the version of XProc for which this library was authored. If the
p:library
has no ancestors in the
XProc namespace, then it
must
have a
version
attribute. See
Section 2.13, “Versioning
Considerations”
For a description of
psvi-required
, see
Section 2.8, “PSVIs in XProc”
; for
xpath-version
, see
Section 2.6, “XPaths
in XProc”
; for
exclude-inline-prefixes
, see
p:inline
Note
The steps declared in a pipeline library are referred to by
their type. It is not an error to put a
p:pipeline
or
p:declare-step
without a
type
in a
p:library
, but there is no standard mechanism
for instantiating it or referring to it. It is effectively
invisible.
Libraries can import pipelines and/or other libraries. See also
Appendix G,
Handling Circular and Re-entrant Library Imports
(Non-Normative)
5.10 p:import
An
p:import
loads a pipeline or
pipeline library, making it available in the pipeline or library
which contains the
p:import
anyURI
/>
An import statement loads the specified IRI and makes any
pipelines declared within it available to the current pipeline.
It is a
static
error
err:XS0052
) if the URI of a
p:import
cannot be retrieved or if, once
retrieved, it does not point to a
p:library
p:declare-step
or
p:pipeline
It is a
static error
err:XS0053
) to
import a single pipeline if that pipeline does not have a
type
Attempts to retrieve the library identified by the URI value may
be redirected at the parser level (for example, in an entity
resolver) or below (at the protocol level, for example, via an HTTP
Location: header). In the absence of additional information outside
the scope of this specification within the resource, the base URI
of the library is always the URI of the actual resource returned.
In other words, it is the URI of the resource retrieved after all
redirection has occurred.
As imports are processed, a processor may encounter new
p:import
elements whose library URI is
the same as one it has already processed in some other context.
This may happen as a consequence of resolving the URI. If the
actual base URI is the same as one that has already been processed,
the implementation must recognize it as the same library and should
not need to process the resource. Also, a duplicate, circular chain
of imports, or a re-entrant import is not an error and
implementations must take the necessary steps to avoid infinite
loops and/or incorrect notification of duplicate step definitions.
It is not an error for a library to import itself. An example of
such steps is listed in
Appendix G,
Handling Circular and Re-entrant Library Imports
(Non-Normative)
A library is considered the same library if the URI of the
resource retrieved is the same. If a pipeline or library author
uses two different URI values that resolve to the same resource,
they must not be considered the same imported library.
5.11 p:pipe
p:pipe
connects an input to a
port on another step.
NCName
port
NCName
/>
The
p:pipe
element connects to a
readable port of another step. It identifies the readable port to
which it connects with the name of the step in the
step
attribute and the name of the port on
that step in the
port
attribute.
In all
cases except the
p:output
of a
compound step
, it is a
static
error
err:XS0022
) if the port identified by a
p:pipe
is not in the
readable ports
of
the step that contains the
p:pipe
p:pipe
that is a
connection
for an
p:output
of a
compound
step
may connect to one of the readable ports of the
compound step or to an output port on one of the compound step's
contained
steps
. In other words, the output of a compound step can
simply be a copy of one of the available inputs or it can be the
output of one of its children.
5.12 p:inline
p:inline
provides a document
inline.
prefix
list
anyElement
The content of the
p:inline
element
is wrapped in a document node and passed as input. The base URI of
the document is the base URI of the
p:inline
element.
It is a
static error
err:XS0024
) if the
content of the
p:inline
element does
not consist of exactly one element, optionally preceded and/or
followed by any number of processing instructions, comments or
whitespace characters.
The in-scope namespaces of the inline document differ from the
in-scope namespace of the content of the
p:inline
element in that bindings for all its
excluded namespaces
, as defined below, are removed:
The XProc namespace itself (
) is excluded.
A namespace URI designated by using an
exclude-inline-prefixes
attribute on the
enclosing
p:inline
is excluded.
A namespace URI designated by using an
exclude-inline-prefixes
attribute on any
ancestor
p:declare-step
p:pipeline
, or
p:library
is
also excluded. (In other words, the effect of several
exclude-inline-prefixes
attributes among the
ancestors of
p:inline
is
cumulative.)
The value of each
exclude-inline-prefixes
attribute is
interpreted as follows:
The value of the attribute is either
#all
, or a whitespace-separated list of tokens, each
of which is either a namespace prefix or
#default
. The namespace bound to each of the
prefixes is designated as an excluded namespace.
It is a
static
error
err:XS0057
) if the
exclude-inline-prefixes
attribute does not
contain a list of tokens or if any of those tokens (except
#all
or
#default
is not a prefix bound to a namespace in the in-scope namespaces of
the element on which it occurs.
The default namespace of the element on which
exclude-inline-prefixes
occurs may be
designated as an excluded namespace by including
#default
in the list of namespace prefixes.
It is a
static
error
err:XS0058
) if the value
#default
is used within the
exclude-inline-prefixes
attribute and there is
no default namespace in scope.
The value
#all
indicates that all
namespaces that are in scope for the element on which
exclude-inline-prefixes
occurs are designated
as excluded namespaces.
The XProc processor
must
include
all in-scope prefixes that are not explicitly excluded. If the
namespace associated with an excluded prefix is used in the
expanded-QName of a descendant element or attribute, the processor
may
include that prefix anyway, or it
may generate a new prefix.
Consider this example:
xmlns:c="http://example.com/c">
which might produce a result like this:
The declaration for “
” must be present
because it was not excluded. The “
part
element uses the namespace bound to “
”,
so
some
binding must be present. In this example, the
original prefix has been preserved, but it would be equally correct
if a different prefix had been used.
5.13 p:document
p:document
reads an XML document
from a URI.
anyURI
/>
The document identified by the URI in the
href
attribute is loaded and returned. If the
URI protocol supports redirection, then redirects
must
be followed.
It is a
dynamic
error
err:XD0011
) if the resource referenced by a
p:document
element does not exist,
cannot be accessed, or is not a well-formed XML document.
The parser which the
p:document
element employs
must
process the
external subset; all general and external parsed entities
must
be fully expanded. It
may
perform
xml:id
processing, but it
must not
perform any other processing, such as
expanding XIncludes. The parser
must
be conformant to
Namespaces in XML
Loading the document
must not
fail due
to validation errors.
Use the
p:load
step if you need to perform DTD-based
validation.
Note
p:document
always
reads
from the specified IRI. In the context of a
p:input
, this seems
perfectly natural. In the context of a
p:output
, this may
seem a little asymmetrical. Putting a
p:document
in a
p:output
causes the pipeline to
read
from the specified IRI and provide that document
as an output
on that port.
Use
p:store
to store the results that appear on a
p:output
5.14 p:data
p:data
reads an arbitrary
resource from a URI.
anyURI
wrapper? =
QName
wrapper-prefix? =
string
wrapper-namespace? =
string
content-type? =
string
/>
The resource identified by the URI in the
href
attribute is loaded, encoded, wrapped in
the wrapper element, and returned as a document. If the URI
protocol supports redirection, then redirects
must
be followed.
The value of the
wrapper
attribute
must
be a
QName
. If the lexical value does not contain a colon,
then the
wrapper-namespace
may be
used to specify the namespace of the wrapper. In that case, the
wrapper-prefix
may be specified to
suggest a prefix for the wrapper element.
It is a
dynamic
error
err:XD0034
) to specify a new namespace or
prefix if the lexical value of the specified name contains a colon
(or if no
wrapper
is explicitly
specified).
In other words, these two
p:data
elements produce equivalent wrappers:
wrapper="x:wrap"/>
wrapper-namespace="http://example.com/ns/"/>
but this
p:data
element will raise
an error:
wrapper="x:wrap" wrapper-prefix="x"
wrapper-namespace="http://example.com/ns/"/>
If no wrapper element is specified, the default is
c:data
string
charset? =
string
encoding? =
string
string
It is a
dynamic
error
err:XD0029
) if the document referenced by a
p:data
element does not exist, cannot
be accessed, or cannot be encoded as specified.
Exactly how the data is encoded depends on the media type of the
resource. If the resource has a content type associated with it
(e.g., if the resource was retrieved with HTTP), then that content
type
must
be used, otherwise, if the
user specified a
content-type
on the
p:data
, then that content type should
be assumed.
If no content type
was specified or is associated with the resource, the inferred
content type is
implementation-dependent
If the media type of the response is an XML media type or text
type with a
charset
parameter that is a
Unicode character encoding (per [
Unicode TR#17
]) or
is recognized as a non-XML media type whose contents are encoded as
a sequence of Unicode characters (e.g. it has a
charset
parameter or the definition of the media
type is such that it requires Unicode), the data
must
be encoded as Unicode character sequence.
If the media type is not an appropriate text type, or if the
processor does not recognize the media type, the content is
base64-encoded.
The resulting data is wrapped in an element with the name
specified in the
wrapper
attribute
(or
c:data
if no
wrapper
is specified).
Implementations
should
add a
content-type
attribute to the
wrapper element which indicates the specified or inferred media
type of the resource (including any parameters). If the content was
base64-encoded, the wrapper
must
have
an
encoding
attribute which
specifies “
base64
”.
If an
encoding
is specified, a
charset
may also be specified. The
character set may be specified as a parameter on the
content-type
or via the separate
charset
option. If it is specified in both
places, the values
must
be
consistent.
If
content-type
encoding
, or
charset
attributes are specified on a
c:data
wrapper, they
must not
be in a namespace; if the user-specified
wrapper is not
c:data
, then the
attributes
must
be in the
namespace.
Implementations may record additional details in
extension attributes
For example, this
p:identity
step:
might produce output like this:
AL,Alabama
AK,Alaska
AZ,Arizona
Whereas this pipeline fragment:
This is a chunk of XHTML.
produces a single
img
element with
an inline graphic (represented using a
data:
URI):
Some steps, such as
p:xquery
and
p:validate-with-relax-ng
, are designed to
process non-XML inputs. If a base64-encoded input occurs in such a
context, it
should
be decoded before
processing. In this way, for example, an XQuery document can be
read with
p:data
and passed to the
p:xquery
step
without regard to how the data was encoded.
5.15 p:empty
p:empty
connects to an
empty sequence
of
documents.
5.16 p:documentation
p:documentation
contains
human-readable documentation.
any-well-formed-content
There are no constraints on the content of the
p:documentation
element. Documentation is
ignored by pipeline processors. See
Section 3.6, “Documentation”
5.17 p:pipeinfo
p:pipeinfo
contains ancillary
information for steps in the pipeline.
any-well-formed-content
There are no constraints on the content of the
p:pipeinfo
element, see
Section 3.7, “Processor
annotations”
6 Errors
Errors in a pipeline can be divided into two classes: static
errors and dynamic errors.
6.1 Static
Errors
[Definition: A
static error
is one which can be
detected before pipeline evaluation is even attempted.]
Examples of static errors include cycles and incorrect
specification of inputs and outputs.
Static errors are fatal and must be detected before any steps
are evaluated.
For a complete list of static errors, see
Section E.1,
“Static Errors”
6.2 Dynamic Errors
[Definition: A
dynamic error
is one which occurs while
a pipeline is being evaluated.]
Examples of dynamic errors
include references to URIs that cannot be resolved, steps which
fail, and pipelines that exhaust the capacity of an implementation
(such as memory or disk space).
If a step fails due to a dynamic error, failure propagates
upwards until either a
p:try
is encountered or the entire pipeline
fails. In other words, outside of a
p:try
, step failure causes the entire
pipeline to fail.
For a complete list of dynamic errors, see
Section E.2,
“Dynamic Errors”
6.3 Step
Errors
Several of the steps in the standard and option step library can
generate dynamic errors.
For a complete list of the dynamic errors raised by builtin
pipeline steps, see
Section E.3, “Step Errors”
7 Standard Step Library
This section describes the standard XProc steps. A
machine-readable description of these steps may be found in
xproc-1.0.xpl
When a step in this library produces an output document, the
base URI of the output is the base URI of the step's primary input
document unless the step's process explicitly sets an
xml:base
attribute or the step's description
explicitly states how the base URI is constructed.
Also, in this section, several steps use this
element for result information:
string
When a step uses an XPath to compute an option value, the XPath
context is as defined in
Section 2.6, “XPaths in XProc”
When a step specifies a particular version of a technology,
implementations
must
implement that
version or a subsequent version that is backwards compatible with
that version. At user-option, they may implement other
non-backwards compatible versions.
7.1 Required
Steps
This section describes standard steps that must be supported by
any conforming processor.
7.1.1 p:add-attribute
The
p:add-attribute
step adds a single
attribute to a set of matching elements. The input document
specified on the
source
is processed for
matches specified by the match pattern in the
match
option. For each of these matches, the
attribute whose name is specified by the
attribute-name
option is set to the attribute value
specified by the
attribute-value
option.
The resulting document is produced on the
result
output port and consists of a exact copy of the
input with the exception of the matched elements. Each of the
matched elements is copied to the output with the addition of the
specified attribute with the specified value.
="
p:add-attribute
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
attribute-name
required
="
true
/>
="
attribute-prefix
/>
="
attribute-namespace
/>
="
attribute-value
required
="
true
/>
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if the match pattern does not
match an element.
The value of the
attribute-name
option
must
be a
QName
If the lexical value does not contain a colon, then the
attribute-namespace
may be used to specify the
namespace of the attribute. In that case, the
attribute-prefix
may be specified to suggest a
prefix for the attribute name.
It is a
dynamic error
err:XD0034
) to
specify a new namespace or prefix if the lexical value of the
specified name contains a colon. The corresponding expanded name is
used to construct the attribute.
The value of the
attribute-value
option
must
be a legal attribute value
according to XML.
If an attribute with the same name as the expanded name from the
attribute-name
option exists on the matched
element, the value specified in the
attribute-value
option is used to set the value of
that existing attribute. That is, the value of the existing
attribute is changed to the
attribute-value
value.
Note
If multiple attributes need to be set on the same element(s),
the
p:set-attributes
step can be used to set
them all at once.
This step cannot be used to add namespace declarations.
It is a
dynamic
error
err:XC0059
) if the QName value in the
attribute-name
option uses the prefix
xmlns
” or any other prefix that resolves
to the namespace name “
”. Note, however, that
while namespace declarations cannot be added explicitly by this
step, adding an attribute whose name is in a namespace for which
there is no namespace declaration in scope on the matched element
may result in a namespace binding being added by
Section 2.4.1, “Namespace Fixup
on Outputs”
If an attribute named
xml:base
is
added or changed, the base URI of the element
must
also be amended accordingly.
7.1.2 p:add-xml-base
The
p:add-xml-base
step exposes the base
URI via explicit
xml:base
attributes. The
input document from the
source
port is
replicated to the
result
port with
xml:base
attributes added to or corrected on each
element as specified by the options on this step.
="
p:add-xml-base
="
source
/>
="
result
/>
="
all
select
="
'false'
/>
="
relative
select
="
'true'
/>
The value of the
all
option
must
be a boolean.
The value of the
relative
option
must
be a boolean.
It is a
dynamic
error
err:XC0058
) if the
all
and
relative
options are
both
true
The
p:add-xml-base
step modifies
its input as follows:
For the document element: force the element to have an
xml:base
attribute with the
document's [base URI] property's value as its value.
For other elements:
If the
all
option has the value
true
, force the element to have an
xml:base
attribute with the
element's [base URI] value as its value.
If the element's [base URI] is different from the its parent's
[base URI], force the element to have an
xml:base
attribute with the following value:
if the value of the
relative
option is
true
, a string which, when resolved
against the parent's [base URI], will give the element's [base
URI], otherwise the element's [base URI].
Otherwise, if there is an
xml:base
attribute present, remove it.
7.1.3 p:compare
The
p:compare
step compares two documents
for equality.
="
p:compare
="
source
primary
="
true
/>
="
alternate
/>
="
result
primary
="
false
/>
="
fail-if-not-equal
select
="
'false'
/>
The value of the
fail-if-not-equal
option
must
be a boolean.
This step takes single documents on each of two ports and
compares them using the
fn:deep-equal
(as
defined in [
XPath
2.0 Functions and Operators
]).
It is a
dynamic
error
err:XC0019
) if the documents are not equal,
and the value of the
fail-if-not-equal
option is
true
. If the documents are
equal, or if the value of the
fail-if-not-equal
option is
false
, a
c:result
document is produced with contents
true
if the documents are equal, otherwise
false
7.1.4 p:count
The
p:count
step counts the number of
documents in the
source
input sequence and
returns a single document on
result
containing that number. The generated document contains a single
c:result
element whose contents is the string representation of the number
of documents in the sequence.
="
p:count
="
source
sequence
="
true
/>
="
result
/>
="
limit
select
="
/>
If the
limit
option is specified
and is greater than zero, the
p:count
step will count at most that many documents. This provides a
convenient mechanism to discover, for example, if a sequence
consists of more than 1 document, without requiring every single
document to be buffered before processing can continue.
7.1.5 p:delete
The
p:delete
step deletes items specified
by a match pattern from the
source
input
document and produces the resulting document, with the deleted
items removed, on the
result
port.
="
p:delete
="
source
/>
="
result
/>
="
match
required
="
true
/>
The value of the
match
option
must
be an XSLTMatchPattern. A match
pattern may match multiple items to be deleted.
If an element is selected by the
match
option, the entire subtree rooted at that element is deleted.
This step cannot be used to remove namespaces.
It is a
dynamic
error
err:XC0062
) if the
match
option matches a namespace node. Also, note
that deleting an attribute named
xml:base
does not change the base URI of the
element on which it occurred.
7.1.6 p:directory-list
The
p:directory-list
step produces a list
of the contents of a specified directory.
="
p:directory-list
="
result
/>
="
path
required
="
true
/>
="
include-filter
/>
="
exclude-filter
/>
The value of the
path
option
must
be an
anyURI
. It is interpreted as an IRI reference. If it is
relative, it is made absolute against the base URI of the element
on which it is specified (
p:with-option
or
p:directory-list
in the case of a
syntactic shortcut
value).
It is a
dynamic
error
err:XC0017
) if the absolute path does not
identify a directory.
It is a
dynamic error
err:XC0012
) if the
contents of the directory path are not available to the step due to
access restrictions in the environment in which the pipeline is
run.
Conformant processors
must
support directory paths whose
scheme is
file
. It is
implementation-defined
what
other schemes are supported by
p:directory-list
, and what the interpretation of
'directory', 'file' and 'contents' is for those schemes.
If present, the value of the
include-filter
or
exclude-filter
option
must
be a regular expression as specified in
XPath 2.0
Functions and Operators
], section 7.61 “
Regular Expression Syntax
”.
If the
include-filter
pattern matches a
directory entry's name, the entry is included in the output. If the
exclude-filter
pattern matches a directory
entry's name, the entry is excluded in the output. If both options
are provided, the include filter is processed first, then the
exclude filter.
The result document produced for the specified
directory path has a
c:directory
document element whose base URI
is the directory path and whose
name
attribute is the last segment of the directory path (that is, the
directory's (local) name).
string
c:file
c:directory
c:other
)*
Its contents are determined as follows, based on the entries in
the directory identified by the directory path. For each entry in
the directory, if either no
filter
was
specified, or the (local) name of the entry matches the filter
pattern, a
c:file
, a
c:directory
, or a
c:other
element is
produced, as follows:
c:directory
is produced for each
subdirectory not determined to be special.
c:file
is produced for each file not
determined to be special.
string
/>
Any file or
directory determined to be special by the
p:directory-list
step may be output using a
c:other
element but the criteria for marking a file as special are
implementation-defined
string
/>
When a directory entry is a subdirectory, that directory's
entries are not output as part of that entry's
c:directory
. A
user must apply this step again to the subdirectory to list
subdirectory contents.
Each of the elements
c:file
c:directory
, and
c:other
has a
name
attribute when it appears within the
top-level
c:directory
element, whose value is a
relative IRI reference, giving the (local) file or directory
name.
Any attributes other than
name
on
c:file
c:directory
, or
c:other
are
implementation-defined
7.1.7 p:error
The
p:error
step generates a
dynamic error
using the input provided to the step.
="
p:error
="
source
primary
="
false
/>
="
result
sequence
="
true
/>
="
code
required
="
true
/>
="
code-prefix
/>
="
code-namespace
/>
The value of the
code
option
must
be a
QName
If the lexical value does not contain a colon, then the
code-namespace
may be used to specify the
namespace of the code. In that case, the
code-prefix
may be specified to suggest a
prefix for the code.
It is a
dynamic error
err:XD0034
) to
specify a new namespace or prefix if the lexical value of the
specified name contains a colon.
This step uses the document provided on its input as the content
of the error raised. An instance of the
c:errors
element will
be produced on the error output port, as is always the case for
dynamic
errors
. The error generated can be caught by a
p:try
just like any other
dynamic error.
For authoring convenience, the
p:error
step is declared with a single, primary
output port. With respect to
connections
, this port behaves like any
other output port even though nothing can ever appear on it since
the step always fails.
For example, given the following invocation:
The error vocabulary element (and document) generated on the
error output port would be:
xmlns:my="http://www.example.org/error">
The
href
line
and
column
, or
offset
, might also be present on the
c:error
to identify
the location of the
p:error
element in
the pipeline.
7.1.8 p:escape-markup
The
p:escape-markup
step applies XML
serialization to the children of the document element and replaces
those children with their serialization. The outcome is a single
element with text content that represents the "escaped" syntax of
the children as they were serialized.
="
p:escape-markup
="
source
/>
="
result
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
This step supports the standard serialization options as
specified in
Section 7.3, “Serialization
Options”
. These options control how the output markup is
produced before it is escaped.
For example, the input:
produces:
<div xmlns="http://www.w3.org/1999/xhtml">
<p>This is a chunk of XHTML.</p>
</div>
Note
The result of this step is an XML document that contains the
Unicode characters that are the characters that result from
escaping the input. It is not encoded characters in a serialized
octet stream, therefore, the serialization options related to
encoding characters (
byte-order-mark
encoding
, and
normalization-form
) do not apply. They are omitted
from the standard serialization options on this step.
By default, this step
must not
generate an XML declaration in the escaped result.
7.1.9 p:filter
The
p:filter
step selects portions of the
source document based on a (possibly dynamically constructed) XPath
select expression.
="
p:filter
="
source
/>
="
result
sequence
="
true
/>
="
select
required
="
true
/>
This step behaves just like an
p:input
with a
select
expression except that the select
expression is computed dynamically.
7.1.10 p:http-request
The
p:http-request
step provides for
interaction with resources over HTTP or related protocols. The
input document provided on the
source
port
specifies a request by a single
c:request
element. This element specifies
the method, resource, and other request properties as well as
possibly including an entity body (content) for the request.
="
p:http-request
="
source
/>
="
result
/>
="
byte-order-mark
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
encoding
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
normalization-form
select
="
'none'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
The standard serialization options are provided to control the
serialization of any XML content which is sent as part of the
request. The effect of these options is as specified in
Section 7.3, “Serialization
Options”
. See
Section 7.1.10.2, “Request
Entity body conversion”
for a discussion of when serialization
occurs in constructing a request.
It is a
dynamic
error
err:XC0040
) if the document element of the
document that arrives on the
source
port is
not
c:request
7.1.10.1 Specifying a request
An HTTP request is represented by a
c:request
element.
NCName
href? =
anyURI
detailed? =
boolean
status-only? =
boolean
username? =
string
password? =
string
auth-method? =
string
send-authorization? =
boolean
override-content-type? =
string
c:header
*,
c:multipart
c:body
)?)
It is a
dynamic
error
err:XC0006
) if the
method
is not specified on a
c:request
It is a
dynamic error
err:XC0005
) if the
request contains a
c:body
or
c:multipart
but the
method
does not allow for an entity body being
sent with the request.
It is a
dynamic
error
err:XC0004
) if the
status-only
attribute has the value
true
and the
detailed
attribute does not have the value
true
The
method
attribute specifies the method
to be used against the IRI specified by the
href
attribute, e.g.
GET
or
POST
(the value is not case-sensitive). If
the
href
attribute is not absolute, it will
be resolved against the base URI of the element on which it is
occurs.
Note
In the case of simple “GET” requests, implementors are
encouraged to support as many protocols as practical. In
particular, pipeline authors may attempt to use
p:http-request
to load documents with computed URIs using the
file:
scheme.
If the
username
attribute is specified,
the
username
password
auth-method
, and
send-authorization
attributes are used to handle
authentication according to the selected authentication method.
For the purposes of avoiding an authentication challenge, if the
send-authorization
attribute has the value
true
and the authentication method
specified by the
auth-method
supports
generation of an
Authorization
header without
a challenge, then an
Authorization
header is
generated and sent on the first request. If the
send-authorization
attribute is absent or has the value
false
, then the first request is sent
without an
Authorization
header.
If the initial response to the request is an authentication
challenge, the
auth-method
username
password
and any
relevant data from the challenge are used to generate an
Authorization
header and the request is sent again. If
that authorization fails, the request is not retried.
Appropriate values for the
auth-method
attribute are “Basic” or “Digest” but other values are allowed. If
the authentication method is “Basic” or “Digest”, authentication is
handled as per [
RFC
2617
].
The
interpretation of
auth-method
values on
c:request
other than “Basic” or
“Digest” is
implementation-defined
It is a
dynamic
error
err:XC0003
) if a
username
or
password
is
specified without specifying an
auth-method
, if the requested
auth-method
isn't supported, or the authentication
challenge contains an authentication method that isn't supported.
All implementations are required to support "Basic" and "Digest"
authentication per [
RFC
2617
].
The
c:header
element specifies a header
name and value, either for inclusion in a request, or as received
in a response.
string
value
string
/>
The request is formulated from the attribute values on the
c:request
element and its
c:header
and
c:multipart
or
c:body
children, if present, and transmitted to the host (and port, if
present) specified by the
href
attribute. The
details of how the request entity body, if any, is constructed are
given in
Section 7.1.10.4,
“Converting Response Entity Bodies”
When the request is formulated, the step and/or protocol
implementation may add headers as necessary to either complete the
request or as appropriate for the content specified (e.g. transfer
encodings). A user of this step is guaranteed that their requested
headers and content will be sent with the exception of any
conflicts with protocol-related headers.
The
p:http-request
step allows users to specify
independently values that are not always independent. For example,
some combinations of
c:header
values (e.g.,
Content-Type
) may be inconsistent with values that
the step and/or protocol implementation must set. In a few cases,
the step provides more than one mechanism to specify what is
actually a single value (e.g., the boundary string in multipart
messages).
It
is a
dynamic
error
err:XC0020
) if the the user specifies a value
or values that are inconsistent with each other or with the
requirements of the step or protocol.
7.1.10.2 Request Entity body
conversion
The
c:multipart
element specifies a
multi-part body, per [
RFC
1521
], either for inclusion in a request or as received
in a response.
string
boundary
string
c:body
In the context of a request, the media type of the
c:multipart
must
be a multipart media type (i.e.
have a main type of 'multipart'). If the
content-type
attribute is not specified, a value of
"multipart/mixed" will be assumed.
The
boundary
attribute is required and is
used to provide a multipart boundary marker. The implementation
must use this boundary marker and must prefix the value with the
string “
--
” when formulating the multipart
message.
It is
dynamic
error
err:XC0002
) if the value starts with the
string “
--
”.
If the boundary is also specified as a parameter in the
content-type
option, then the parameter
value specified and the
boundary
value
specified
must
be the same. If the
boundary is specified in both the
boundary
option and the
content-type
option then the
two values
must
be the same.
The
c:body
element holds the body or body part of the message. Each of the
attributes holds controls some aspect of the encoding the request
body or decoding the body element's content when the request is
formulated. These are specified as follows:
string
encoding? =
string
id? =
string
description? =
string
disposition? =
string
anyElement
The
content-type
attribute specifies the
media type of the body or body part, that is, the value of its
Content-Type
header. If the media type is not
an XML type or text, the content must already be
base64-encoded.
The
encoding
attribute controls the
decoding of the element content for formulating the body. A value
of
base64
indicates the element's content
is a base64 encoded string whose byte stream should be sent as the
message body.
An implementation
may support encodings other than
base64
but these encodings and their names are
implementation-defined
It is a
dynamic
error
err:XC0052
) if the encoding specified is not
supported by the implementation.
Note
The
p:http-request
step provides only a single
set of serialization options for XML media types. There's no direct
support for sending a multipart message with two XML parts encoded
differently.
For each body or body part, the
id
attribute specifies the value of the
Content-ID
header; the
description
attribute specifies the value of the
Content-Description
header; and the
disposition
attribute specifies the value of
the
Content-Disposition
header.
If an entity body is to be sent as part of a request (e.g. a
POST
), either a
c:body
element, specifying the request
entity body, or a
c:multipart
element, specifying multiple
entity body parts, may be used. When
c:multipart
is
used it may contain multiple
c:body
children. A
c:body
specifies the
construction of a body or body part as follows:
If the
content-type
attribute does not
specify an XML media type, or the
encoding
attribute is “
base64
”, then
it is a
dynamic
error
err:XC0028
) if the content of the
c:body
element does not
consist entirely of characters, and the entity body or body part
will consist of exactly those characters.
Otherwise (the
content-type
attribute
does
specify an XML media type and the
encoding
attribute is
not
'base64'),
it is a
dynamic
error
err:XC0022
) if the content of the
c:body
element does not
consist of exactly one element, optionally preceded and/or followed
by any number of processing instructions, comments or whitespace
characters, and the entity body or body part will consist of the
serialization of a document node containing that content. The
serialization of that document is controlled by the serialization
options on the
p:http-request
step
itself.
For example, the following input to a
p:http-request
step will POST a small XML document:
The corresponding request should look something like this:
POST http://example.com/someservice HTTP/1.1
Host: example.com
Content-Type: application/xml; charset="utf-8"
OK! This is a chunk. This is a another chunk.
7.1.10.3 Managing the response
The handling of the response to the request and the generation
of the step's result document is controlled by the
status-only
override-content-type
and
detailed
attributes on the
c:request
input.
The
override-content-type
attribute
controls interpretation of the response's
Content-Type
header. If this attribute is present, the
response will be treated as if it returned the
Content-Type
given by its value. This original
Content-Type
header will however be reflected
unchanged as a
c:header
in the result document.
It is a
dynamic
error
err:XC0030
) if the
override-content-type
value cannot be used (e.g.
text/plain
to override
image/png
).
If the
status-only
attribute has the value
true
, the result document will contain
only header information. The entity of the response will not be
processed to produce a
c:body
or
c:multipart
element.
The
c:response
element represents an HTTP
response. The response's status code is encoded in the
status
attribute and the headers and entity body are
processing into
c:header
and
c:multipart
or
c:body
content.
integer
c:header
*,
c:multipart
c:body
)?)
The value of the
detailed
attribute
determines the content of the result document. If it is
true
, the response to the request is handled as
follows:
A single
c:response
element is produced with the
status
attribute containing the status of the
response received.
Each response header is translated into a
c:header
element.
Unless the
status-only
attribute has a
value
true
, the entity body of the
response is converted into a
c:body
or
c:multipart
element via the rules given in
Section 7.1.10.4,
“Converting Response Entity Bodies”
Otherwise (the
detailed
attribute is not
specified or its value is
false
), the
response to the request is handled as follows:
If the media type (as determined by the
override-content-type
attribute or the
Content-Type
response header) is an XML media type, the
entity is decoded if necessary, then parsed as an XML document and
produced on the
result
output port as the
entire output of the step.
Otherwise, the entity body of the response is converted into a
c:body
or
c:multipart
element via the rules given in
Section 7.1.10.4,
“Converting Response Entity Bodies”
In either case the base URI of the output document is the
resolved value of the
href
attribute from the
input
c:request
7.1.10.3.1 Redirects
One possible response from an HTTP request is a redirect,
indicated by a status code in the three-hundred range. The precise
semantics of the 3xx return codes are laid out by section
10.3 Redirection 3xx
in [
RFC 2616
].
The
p:http-request
step
should
follow redirect requests (in a manner
consistent with [
RFC
2616
]) if they are returned by the server.
7.1.10.3.2 Cookies
With one exception, in version 1.0 of XProc, the
p:http-request
step does not provide any standard mechanisms for managing cookies.
Pipeline authors that need to
preserve cookies across several
p:http-request
calls in the same pipeline or across multiple invocations of the
same or different pipelines will have to rely on
implementation-defined
mechanisms.
The exception arises in the case of redirection. If a redirect
response includes cookies, those cookies
should
be forwarded as appropriate to the
redirected location when the redirection is followed.
This behavior will allow the
p:http-request
step to interoperate with web services that use cookies as part of
an authentication protocol.
7.1.10.4 Converting Response Entity
Bodies
The entity of a response may be multipart per [
RFC 1521
]. In those
situations, the result document will be a
c:multipart
element that contains multiple
c:body
elements inside.
Note
Although it is technically possible for any of the individual
parts of a multipart message to
also
be multipart, XProc
does not provide a standard representation for such messages.
The interpretation of a
multipart message inside another multipart message is
implementation-dependent
The result of the
p:http-request
step is an XML document. For
media types (images, binaries, etc.) that can't be represented as a
sequence of Unicode characters, the response is encoded as
base64
and then returned as text children
of the
c:body
element. If the content is base64-encoded, the
encoding
attribute on
c:body
must be set to
base64
”.
If the media type of the response is a text type with a
charset
parameter that is a Unicode character
encoding (per [
Unicode
TR#17
]) or is recognized as a non-XML media type whose
contents are encoded as a sequence of Unicode characters (e.g. it
has a character parameter or the definition of the media type is
such that it requires Unicode), the content of the constructed
c:body
element
is the translation of the text into a sequence of Unicode
characters.
If the response is an XML media type, the content of the
constructed
c:body
element is the result of decoding the
body as necessary, then parsing it with an XML parser. If the
content is not well-formed, the step fails.
In a
c:body
in a response, the
content-type
attribute
must
be an exact copy of the
value returned in the
Content-Type
header.
That is, it must reflect the content type actually returned, not
any override value that may have been specified, and it must
include any parameters returned by the server.
In the case of a multipart response, the same rules apply when
constructing a
c:body
element for each body part
encountered.
Note
Given the above description, any content identified as
text/html
will be encoded as (escaped) text
or base64-encoded in the
c:body
element, as HTML isn't always
well-formed XML. A user can attempt to convert such content into
XML using the
p:unescape-markup
step.
7.1.10.5 HTTP Request Example
A simple form might be posted as follows:
name=W3C&spec=XProc
and if the response was an XHTML document, the result document
would be:
7.1.11 p:identity
The
p:identity
step makes a verbatim copy
of its input available on its output.
="
p:identity
="
source
sequence
="
true
/>
="
result
sequence
="
true
/>
If the implementation supports passing PSVI annotations between
steps, the
p:identity
step
must
preserve any annotations that
appear in the input.
7.1.12 p:insert
The
p:insert
step inserts the
insertion
port's document into the
source
port's document relative to the matching
elements in the
source
port's document.
="
p:insert
="
source
primary
="
true
/>
="
insertion
sequence
="
true
/>
="
result
/>
="
match
select
="
'/*'
/>
="
position
required
="
true
/>
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if that pattern matches anything
other than element, text, processing-instruction, or comment nodes.
Multiple matches are allowed, in which case multiple copies of the
insertion
documents will occur. If no
elements match, then the document is unchanged.
The value of the
position
option
must
be an NMTOKEN in the following
list:
first-child
” - the insertion is made
as the first child of the match;
last-child
” - the insertion is made as
the last child of the match;
before
” - the insertion is made as the
immediate preceding sibling of the match;
after
” - the insertion is made as the
immediate following sibling of the match.
It is a
dynamic
error
err:XC0025
) if the match pattern matches
anything other than an element node and the value of the
position
option is “
first-child
” or “
last-child
”.
As the inserted elements are part of the output of the step they
are not considered in determining matching elements. If an empty
sequence appears on the
insertion
port, the
result will be the same as the source.
7.1.13 p:label-elements
The
p:label-elements
step generates a
label for each matched element and stores that label in the
specified attribute.
="
p:label-elements
="
source
/>
="
result
/>
="
attribute
select
="
'xml:id'
/>
="
attribute-prefix
/>
="
attribute-namespace
/>
="
label
select
="
'concat("_",$p:index)'
/>
="
match
select
="
'*'
/>
="
replace
select
="
'true'
/>
The value of the
attribute
option
must
be a
QName
If the lexical value does not contain a colon, then the
attribute-namespace
may be used to specify the
namespace of the attribute name. In that case, the
attribute-prefix
may be specified to suggest a
prefix for the attribute name.
It is a
dynamic error
err:XD0034
) to
specify a new namespace or prefix if the lexical value of the
specified name contains a colon.
The value of the
label
option is an
XPath expression used to generate the value of the attribute
label.
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if that expression matches
anything other than element nodes.
The value of the
replace
must
be a boolean value and is used to indicate
whether existing attribute values are replaced.
This step operates by generating attribute labels for each
element matched. For every matched element, the expression is
evaluated with the context node set to the matched element. An
attribute is added to the matched element using the attribute name
is specified the
attribute
option and the
string value of result of evaluating the expression. If the
attribute already exists on the matched element, the value is
replaced with the string value only if the
replace
option has the value of
true
If this step is used to add or change the value of an attribute
named “
xml:base
”, the base URI of the
element
must
also be amended
accordingly.
An implementation must bind the variable “
p:index
” in the static context of each evaluation of
the XPath expression to the position of the element in the sequence
of matched elements. In other words, the first element (in document
order) matched gets the value “
”, the
second gets the value “
”, the third,
”, etc.
The result of the p:label-elements step is the input document
with the attribute labels associated with matched elements. All
other non-matching content remains the same.
7.1.14 p:load
The
p:load
step has no inputs but produces
as its result an XML resource specified by an IRI.
="
p:load
="
result
/>
="
href
required
="
true
/>
="
dtd-validate
select
="
'false'
/>
The value of the
href
option
must
be an
anyURI
. It is interpreted as an IRI reference. If it is
relative, it is made absolute against the base URI of the element
on which it is specified (
p:with-option
or
p:load
in the case of a
syntactic shortcut
value).
The value of the
dtd-validate
option
must
be a boolean.
The
p:load
step is the same as
p:document
with two additions:
The URI to be accessed can be constructed dynamically by the
pipeline.
The
p:load
step has an option to
invoke DTD validation.
When
dtd-validate
is
false
p:load
processing is the same as
p:document
processing on the computed
href
value.
When
dtd-validate
is
true
p:load
processing
is the same as
p:document
processing on the computed
href
value but
must
use a validating parser.
It is a
dynamic
error
err:XC0027
) if the document is not valid or
the step doesn't support DTD validation.
The retrieved document is produced on the
result
port. The base URI of the result is the
(absolute) IRI used to retrieve it.
7.1.15 p:make-absolute-uris
The
p:make-absolute-uris
step makes an
element or attribute's value in the source document an absolute IRI
value in the result document.
="
p:make-absolute-uris
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
base-uri
/>
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if the pattern matches anything
other than element or attribute nodes.
The value of the
base-uri
option
must
be an
anyURI
. It is interpreted as an IRI reference. If it is
relative, it is made absolute against the base URI of the element
on which it is specified (
p:with-option
or
p:make-absolute-uris
in the case of a
syntactic shortcut
value).
For every element or attribute in the input document which
matches the specified pattern, its XPath string-value is resolved
against the specified base URI and the resulting absolute IRI is
used as the matched node's entire contents in the output.
The base URI used for resolution defaults to the matched
attribute's element or the matched element's base URI unless the
base-uri
option is specified. When the
base-uri
option is specified, the option
value is used as the base URI regardless of any contextual base URI
value in the document. This option value is resolved against the
base URI of the
p:option
element used to set the option.
If the IRI reference
specified by the
base-uri
option on
p:make-absolute-uris
is not valid, or
if it is absent and the input document has no base URI, the results
are
implementation-dependent
7.1.16 p:namespace-rename
The
p:namespace-rename
step renames any
namespace declaration or use of a namespace in a document to a new
IRI value.
="
p:namespace-rename
="
source
/>
="
result
/>
="
from
/>
="
to
/>
="
apply-to
select
="
'all'
/>
The value of the
from
option
must
be an
anyURI
. It
should
be
either empty or absolute, but will not be resolved in any case.
The value of the
to
option
must
be an
anyURI
. It
should
be empty or absolute, but will
not be resolved in any case.
The value of the
apply-to
option
must
be one of “
all
”, “
elements
”, or
attributes
”. If the value is “
elements
”, only elements will be renamed, if the
value is “
attributes
”, only attributes
will be renamed, if the value is “
all
”,
both elements and attributes will be renamed.
It is a
dynamic
error
err:XC0014
) if the XML namespace (
) or the XMLNS
namespace (
) is
the value of either the
from
option or the
to
option.
If the value of the
from
option is the
same as the value of the
to
option, the
input is reproduced unchanged on the output. Otherwise, namespace
bindings, namespace attributes and element and attribute names are
changed as follows:
Namespace bindings: If the
from
option
is present and its value is not the empty string, then every
binding of a prefix (or the default namespace) in the input
document whose value is the same as the value of the
from
option is
replaced in the output with a binding to the value of the
to
option, provided it is present and not
the empty string;
otherwise (the
to
option is not
specified or has an empty string as its value) absent from the
output.
If the
from
option is absent, or its
value is the empty string, then no bindings are changed or
removed.
Elements and attributes: If the
from
option is present and its value is not the empty string, for every
element and attribute, as appropriate, in the input whose namespace
name is the same as the value of the
from
option, in the output its namespace name is
replaced with the value of the
to
option, provided it is present and not the empty string;
otherwise (the
to
option is not
specified or has an empty string as its value) changed to have no
value.
If the
from
option is absent, or its
value is the empty string, then for every element and attribute, as
appropriate, whose namespace name has no value, in the output its
namespace name is set to the value of the
to
option.
Namespace attributes: If the
from
option
is present and its value is not the empty string, for every
namespace attribute in the input whose value is the same as the
value of the
from
option, in the output
the namespace attribute's value is replaced with the value of
the
to
option, provided it is present and
not the empty string;
otherwise (the
to
option is not
specified or has an empty string as its value) the namespace
attribute is absent.
Note
The
apply-to
option is primarily
intended to make it possible to avoid renaming attributes when the
from
option specifies no namespace, since
many attributes are in no namespace.
Care should be taken when specifying no namespace with the
to
option. Prefixed names in content, for
example QNames and XPath expressions, may end up with no
appropriate namespace binding.
7.1.17 p:pack
The
p:pack
step merges two document
sequences in a pair-wise fashion.
="
p:pack
="
source
sequence
="
true
primary
="
true
/>
="
alternate
sequence
="
true
/>
="
result
sequence
="
true
/>
="
wrapper
required
="
true
/>
="
wrapper-prefix
/>
="
wrapper-namespace
/>
The value of the
wrapper
option
must
be a
QName
If the lexical value does not contain a colon, then the
wrapper-namespace
may be used to specify the
namespace of the wrapper. In that case, the
wrapper-prefix
may be specified to suggest a
prefix for the wrapper element.
It is a
dynamic error
err:XD0034
) to
specify a new namespace or prefix if the lexical value of the
specified name contains a colon.
The step takes each pair of documents, in order, one from the
source
port and one from the
alternate
port, wraps them with a new element node
whose QName is the value specified in the
wrapper
option, and writes that element to the
result
port as a document.
If the step reaches the end of one input sequence before the
other, then it simply wraps each of the remaining documents in the
longer sequence.
Note
In the common case, where the document element of a document in
the
result
sequence has two element children,
any comments, processing instructions, or white space text nodes
that occur between them may have come from either of the input
documents; this step does not attempt to distinguish which one.
7.1.18 p:parameters
The
p:parameters
step exposes a set of
parameters as a
c:param-set
document.
="
p:parameters
="
parameters
kind
="
parameter
primary
="
false
/>
="
result
primary
="
false
/>
Each parameter passed to the step is converted into a
c:param
element. The
step resolves duplicate parameters in the normal way (see
Section 5.1.2,
“Parameter Inputs”
) so at most one parameter with any given
name will appear in the result. The resulting
c:param
elements are
wrapped in a
c:param-set
and the parameter set document
is written to the
result
port.
The order in which
c:param
elements occur
in the
c:param-set
is
implementation-dependent
For consistency and user convenience, if any of the parameters
have names that are in a namespace, the
namespace
attribute on the
c:param
element
must
be used. Each
name
must
be an
NCName.
The base URI of the output document is the URI of the pipeline
document that contains the step.
Note
Since the
parameters
port is
not
primary, any explicit
p:with-param
settings
must
include a
port
attribute, per the last paragraph of
Section 5.7.4, “p:with-param”
7.1.19 p:rename
The
p:rename
step renames elements,
attributes, or processing-instruction targets in a document.
="
p:rename
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
new-name
required
="
true
/>
="
new-prefix
/>
="
new-namespace
/>
The value of the
match
option must be an
XSLTMatchPattern.
It is a
dynamic error
err:XC0023
) if the
pattern matches anything other than element, attribute or
processing instruction nodes.
The value of the
new-name
option must be
QName
. If the lexical value does not
contain a colon, then the
new-namespace
may be used to specify the
namespace of the new name. In that case, the
new-prefix
may be specified to suggest a
prefix for the new name.
It is a
dynamic error
err:XD0034
) to
specify a new namespace or prefix if the lexical value of the
specified name contains a colon.
Each element, attribute, or processing-instruction in the input
matched by the match pattern specified in the
match
option is renamed in the output to the name
specified by the
new-name
option.
If the
match
option matches an attribute
and if the element on which it occurs already has an attribute
whose expanded name is the same as the expanded name of the
specified
new-name
, then the results is as
if the current attribute named “
new-name
” was deleted before renaming the
matched attribute.
With respect to attributes named “
xml:base
”, the following semantics apply: renaming
an
from
xml:base
to
something else has no effect on the underlying base URI of the
element; however, if an attribute is renamed
from
something else
to
xml:base
”, the
base URI of the element
must
also be
amended accordingly.
If the pattern matches processing instructions, then it is the
processing instruction target that is renamed.
It is a
dynamic
error
err:XC0013
) if the pattern matches a
processing instruction and the new name has a non-null
namespace.
7.1.20 p:replace
The
p:replace
step replaces matching nodes
in its primary input with the document element of the
replacement
port's document.
="
p:replace
="
source
primary
="
true
/>
="
replacement
/>
="
result
/>
="
match
required
="
true
/>
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if that pattern matches anything
other than element, text, processing-instruction, or comment nodes.
Multiple matches are allowed, in which case multiple copies of the
replacement
document will occur.
Every node in the primary input matching the specified pattern
is replaced in the output is replaced by the document element of
the
replacement
document. Only non-nested
matches are replaced. That is, once a node is replaced, its
descendants cannot be matched.
7.1.21 p:set-attributes
The
p:set-attributes
step sets
attributes on matching elements.
="
p:set-attributes
="
source
primary
="
true
/>
="
attributes
/>
="
result
/>
="
match
required
="
true
/>
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if that pattern matches anything
other than element nodes.
Each attribute on the document element of the document that
appears on the
attributes
port is copied to
each element that matches the
match
expression.
If an attribute with the same name as one of the attributes to
be copied already exists, the value specified on the
attribute
port's document is used. The result port of
this step produces a copy of the
source
port's document with the matching elements' attributes
modified.
The matching elements are specified by the match pattern in the
match
option. All matching elements are
processed. If no elements match, the step will not change any
elements.
This step must not copy namespace declarations. If the
attributes copied from the
attributes
use
namespaces, prefixes, or prefixes bound to different namespaces,
the document produced on the
result
output
port will require
namespace
fixup
If an attribute named
xml:base
is
added or changed, the base URI of the element
must
also be amended accordingly.
7.1.22 p:sink
The
p:sink
step accepts a sequence
of documents and discards them. It has no output.
="
p:sink
="
source
sequence
="
true
/>
7.1.23 p:split-sequence
The
p:split-sequence
step accepts a
sequence of documents and divides it into two sequences.
="
p:split-sequence
="
source
sequence
="
true
/>
="
matched
sequence
="
true
primary
="
true
/>
="
not-matched
sequence
="
true
/>
="
initial-only
select
="
'false'
/>
="
test
required
="
true
/>
The value of the
test
option
must
be an XPathExpression.
The XPath expression in the
test
option
is applied to each document in the input sequence. If the effective
boolean value of the expression is true, the document is copied to
the
matched
port; otherwise it is copied to
the
not-matched
port.
If the
initial-only
option is true, then
when the first document that does not satisfy the test expression
is encountered, it
and all the documents that follow it
are written to the
not-matched
port. In other
words, it only writes the initial series of matched documents
(which may be empty) to the
matched
port. All
other documents are written to the
not-matched
port, irrespective of whether or not they
match.
The
XPath context
for the
test
option changes over time. For each
document that appears on the
source
port, the
expression is evaluated with that document as the context document.
The context position (
position()
) is the
position of that document within the sequence and the context size
last()
) is the total number of documents in
the sequence.
Note
In principle, this component cannot stream because it must
buffer all of the input sequence in order to find the context size.
In practice, if the test expression does not use the
last()
function, the implementation can stream and
ignore the context size.
If the implementation supports passing PSVI annotations between
steps, the
p:split-sequence
step
must
preserve any annotations that
appear in the input.
7.1.24 p:store
The
p:store
step stores a
serialized version of its input to a URI. This step outputs a
reference to the location of the stored document.
="
p:store
="
source
/>
="
result
primary
="
false
/>
="
href
required
="
true
/>
="
byte-order-mark
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
encoding
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
normalization-form
select
="
'none'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
The value of the
href
option
must
be an
anyURI
. If it is relative, it is made absolute against
the base URI of the element on which it is specified (
p:with-option
or
p:store
in the case of a
syntactic shortcut
value).
The step attempts to store the XML document to the specified
URI.
It is a
dynamic
error
err:XC0050
) if the URI scheme is not
supported or the step cannot store to the specified location.
The output of this step is a document containing a single
c:result
element whose content is the absolute URI of the document stored by
the step.
The standard serialization options are provided to control the
serialization of the XML content when it is stored. These options
are as specified in
Section 7.3, “Serialization
Options”
7.1.25 p:string-replace
The
p:string-replace
step matches
nodes in the document provided on the
source
port and replaces them with the string result of evaluating an
XPath expression.
="
p:string-replace
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
replace
required
="
true
/>
The value of the
match
option
must
be an XSLTMatchPattern.
The value of the
replace
option
must
be an XPathExpression.
The matched nodes are specified with the match pattern in the
match
option. For each matching node, the
XPath expression provided by the
replace
option is evaluated with the matching node as the XPath context
node. The string value of the result is used in the output. Nodes
that do not match are copied without change.
If the expression given in the
match
option matches an
attribute
, the string value of the
replace
expression is used as the new value
of the attribute in the output. If the attribute is named
xml:base
”, the base URI of the
element
must
also be amended
accordingly.
If the expression matches any other kind of node, the entire
node (and
not
just its contents) is replaced by the string
value of the
replace
expression.
7.1.26 p:unescape-markup
The
p:unescape-markup
step takes
the string value of the document element and parses the content as
if it was a Unicode character stream containing serialized XML. The
output consists of the same document element with children that
result from the parse. This is the reverse of the
p:escape-markup
step.
="
p:unescape-markup
="
source
/>
="
result
/>
="
namespace
/>
="
content-type
select
="
'application/xml'
/>
="
encoding
/>
="
charset
/>
The value of the
namespace
option
must
be an
anyURI
. It
should
be
absolute, but will not be resolved.
When the string value is parsed, the original document element
is preserved so that the result will be well-formed XML even if the
content consists of multiple, sibling elements.
The
namespace
option specifies a default
namespace. Elements that are in no namespace in the unescaped
content will be placed into this namespace unless there is an
in-scope namespace declaration that specifies a different namespace
(or explicitly undeclares the default namespace).
The
content-type
option
may
be used to specify an alternate content type
for the string value. An implementation
may
use a different parser to produce XML content
depending on the specified content-type. For example, an
implementation might provide an HTML to XHTML parser (e.g.
HTML Tidy
] or
TagSoup
]) for
the content type '
text/html
'.
All implementations
must
support
the content type
application/xml
, and must
use a standard XML parser for it.
It is a
dynamic error
err:XC0051
) if the
content-type specified is not supported by the implementation.
Behavior of
p:unescape-markup
for
content-type
s other than
application/xml
is
implementation-defined
The
encoding
option specifies how the
data is encoded. All implementations
must
support the
base64
encoding (and the absence of an encoding option, which implies that
the content is plain Unicode text).
It is a
dynamic error
err:XC0052
) if the
encoding specified is not supported by the implementation.
If an
encoding
is specified, a
charset
may also be specified. The
character set may be specified as a parameter on the
content-type
or via the separate
charset
option. If it is specified in both places,
the value of the
charset
option
must
be used.
If the specified
encoding
is
base64
, then the character set
must
be specified.
It is a
dynamic error
err:XC0010
) if an
encoding of
base64
is specified and the
character set is not specified or if the specified character set is
not supported by the implementation.
The octet-stream that results from decoding the text
must
be interpreted using the
character encoding named by the value of the
charset
option to produce a sequence of Unicode
characters to parse.
If no
encoding
is specified, the
character set is ignored, irrespective of where it was
specified.
For example, with the 'namespace' option set to the XHTML
namespace, the following input:
<p>This is a chunk.</p>
<p>This is a another chunk.</p>
would produce:
7.1.27 p:unwrap
The
p:unwrap
step replaces matched
elements with their children.
="
p:unwrap
="
source
/>
="
result
/>
="
match
required
="
true
/>
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if that pattern matches anything
other than element nodes.
Every element in the
source
document that
matches the specified
match
pattern is
replaced by its children, effectively “unwrapping” the children
from their parent. Non-element nodes and unmatched elements are
passed through unchanged.
Note
The matching applies to the entire document, not just the
“top-most” matches. A pattern of the form
h:div
will replace
all
h:div
elements, not just the top-most ones.
This step produces a single document; if the document element is
unwrapped, the result might not be well-formed XML.
7.1.28 p:wrap
The
p:wrap
step wraps matching
nodes in the
source
document with a new
parent element.
="
p:wrap
="
source
/>
="
result
/>
="
wrapper
required
="
true
/>
="
wrapper-prefix
/>
="
wrapper-namespace
/>
="
match
required
="
true
/>
="
group-adjacent
/>
The value of the
wrapper
option
must
be a
QName
If the lexical value does not contain a colon, then the
wrapper-namespace
may be used to specify the
namespace of the wrapper. In that case, the
wrapper-prefix
may be specified to suggest a
prefix for the wrapper element.
It is a
dynamic error
err:XD0034
) to
specify a new namespace or prefix if the lexical value of the
specified name contains a colon.
The value of the
match
option
must
be an XSLTMatchPattern.
It is a
dynamic
error
err:XC0023
) if the pattern matches anything
other than document, element, text, processing instruction, and
comment nodes.
The value of the
group-adjacent
option
must
be an XPathExpression.
If the node matched is the document node (
match="/"
), the result is a new document where the
document element is a new element node whose QName is the value
specified in the
wrapper
option. That new
element contains copies of all of the children of the original
document node.
When the match pattern does not match the document node, every
node that matches the specified
match
pattern is replaced with a new element node whose QName is the
value specified in the
wrapper
option. The
content of that new element is a copy of the original, matching
node. The
p:wrap
step performs a
"deep" wrapping, the children of the matching node and their
descendants are processed and wrappers are added to all matching
nodes.
The
group-adjacent
option can be used to
group adjacent matching nodes in a single wrapper element. The
specified XPath expression is evaluated for each matching node with
that node as the XPath context node. Whenever two or more adjacent
matching nodes have the same “group adjacent” value, they are
wrapped together in a single wrapper element.
Two matching nodes are considered adjacent if and only if they
are siblings and either there are no nodes between them or all
intervening, non-matching nodes are whitespace text, comment, or
processing instruction nodes.
7.1.29 p:wrap-sequence
The
p:wrap-sequence
step accepts a
sequence of documents and produces either a single document or a
new sequence of documents.
="
p:wrap-sequence
="
source
sequence
="
true
/>
="
result
sequence
="
true
/>
="
wrapper
required
="
true
/>
="
wrapper-prefix
/>
="
wrapper-namespace
/>
="
group-adjacent
/>
The value of the
wrapper
option
must
be a
QName
If the lexical value does not contain a colon, then the
wrapper-namespace
may be used to specify the
namespace of the wrapper. In that case, the
wrapper-prefix
may be specified to suggest a
prefix for the wrapper element.
It is a
dynamic error
err:XD0034
) to
specify a new namespace or prefix if the lexical value of the
specified name contains a colon.
The value of the
group-adjacent
option
must
be an XPathExpression.
In its simplest form,
p:wrap-sequence
takes a sequence of documents
and produces a single, new document by placing each document in the
source
sequence inside a new document element
as sequential siblings. The name of the document element is the
value specified in the
wrapper
option.
The
group-adjacent
option can be used to
group adjacent documents. The
XPath
context
for the
group-adjacent
option
changes over time. For each document that appears on the
source
port, the expression is evaluated with that
document as the context document. The context position (
position()
) is the position of that document within the
sequence and the context size (
last()
) is the
total number of documents in the sequence. Whenever two or more
sequentially adjacent documents have the same “group adjacent”
value, they are wrapped together in a single wrapper element.
7.1.30 p:xinclude
The
p:xinclude
step applies
XInclude
processing to the
source
document.
="
p:xinclude
="
source
/>
="
result
/>
="
fixup-xml-base
select
="
'false'
/>
="
fixup-xml-lang
select
="
'false'
/>
The value of the
fixup-xml-base
option
must
be a boolean. If it is true, base
URI fixup will be performed as per [
XInclude
].
The value of the
fixup-xml-lang
option
must
be a boolean. If it is true,
language fixup will be performed as per [
XInclude
].
The included documents are located with the base URI of the
input document and are not provided as input to the step.
It is a
dynamic
error
err:XC0029
) if an XInclude error occurs
during processing.
7.1.31 p:xslt
The
p:xslt
step applies an
XSLT 1.0
] or
XSLT 2.0
stylesheet to a document.
="
p:xslt
="
source
sequence
="
true
primary
="
true
/>
="
stylesheet
/>
="
parameters
kind
="
parameter
/>
="
result
primary
="
true
/>
="
secondary
sequence
="
true
/>
="
initial-mode
/>
="
template-name
/>
="
output-base-uri
/>
="
version
/>
If present, the value of the
initial-mode
option
must
be a
QName
If present, the value of the
template-name
option
must
be a
QName
If present, the value of the
output-base-uri
option
must
be an
anyURI
. If it is
relative, it is made absolute against the base URI of the element
on which it is specified (
p:with-option
or
p:xslt
in the case of a
syntactic shortcut
value).
If the step specifies a
version
, then
that version of XSLT
must
be used to
process the transformation.
It is a
dynamic error
err:XC0038
) if the
specified version of XSLT is not available. If the step does not
specify a version, the implementation may use any version it has
available and may use any means to determine what version to use,
including, but not limited to, examining the version of the
stylesheet.
The XSLT stylesheet provided on the
stylesheet
port is applied to the document on the
source
port. Any parameters passed on the
parameters
port are used to define top-level
stylesheet parameters. The primary result document of the
transformation appears on the
result
port.
All other result documents appear on the
secondary
port. If XSLT 1.0 is used, an empty sequence
of documents
must
appear on the
secondary
port.
If a sequence of documents is provided on the
source
port, the first document is used as the primary
input document. The whole sequence is also the default collection.
If no documents are provided on the
source
port, the primary input document is undefined and the default
collection is empty.
It is a
dynamic error
err:XC0039
) if a
sequence of documents (including an empty sequence) is provided to
an XSLT 1.0 step.
A dynamic error occurs if the XSLT processor signals a fatal
error. This includes the case where the transformation terminates
due to a
xsl:message
instruction with
terminate
attribute value of
yes
”.
How XSLT message termination errors are reported to
the XProc processor is
implementation-dependent
The invocation of the transformation is controlled by the
initial-mode
and
template-name
options that set the initial mode
and/or named template in the XSLT transformation where processing
begins.
It is
dynamic
error
err:XC0056
) if the specified initial mode or
named template cannot be applied to the specified stylesheet.
The
output-base-uri
option sets the
context's output base URI per the XSLT 2.0 specification, otherwise
the base URI of the
result
document is the
base URI of the first document in the
source
port's sequence. If the value of the
output-base-uri
option is not absolute, it will be
resolved using the base URI of its
p:option
element. An XSLT 1.0 step
should
use the value of the
output-base-uri
as the base URI of its output, if the
option is specified.
If XSLT 2.0 is used, the outputs of this step
may
include PSVI annotations.
The static and initial dynamic contexts of the XSLT processor
are the contexts defined in
Section 2.6.1.2, “Step XPath Context”
for an XSLT 1.0 processor and
Section 2.6.2.2, “Step XPath
Context”
for an XSLT 2.0 processor with the following
adjustments.
The dynamic context is augmented as follows:
Context item
The first document that appears on the
source
port.
Variable values
Any parameters passed on the
parameters
port are available as variable bindings to the XSLT processor.
Function implementations
The function implementations provided by the XSLT processor.
Default collection
The sequence of documents provided on the
source
port.
7.2 Optional
Steps
The following steps are optional. If they are supported by a
processor, they must conform to the semantics outlined here, but a
conformant processor is not required to support all (or any) of
these steps.
7.2.1 p:exec
The
p:exec
step runs an external
command passing the input that arrives on its
source
port as standard input, reading
result
from standard output, and
errors
from standard error.
="
p:exec
="
source
primary
="
true
sequence
="
true
/>
="
result
primary
="
true
/>
="
errors
/>
="
exit-status
/>
="
command
required
="
true
/>
="
args
select
="
''
/>
="
cwd
/>
="
source-is-xml
select
="
'true'
/>
="
result-is-xml
select
="
'true'
/>
="
wrap-result-lines
select
="
'false'
/>
="
errors-is-xml
select
="
'false'
/>
="
wrap-error-lines
select
="
'false'
/>
="
path-separator
/>
="
failure-threshold
/>
="
arg-separator
select
="
/>
="
byte-order-mark
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
encoding
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
normalization-form
select
="
'none'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
The values of the
command
args
cwd
path-separator
, and
arg-separator
options
must
be strings.
The values of the
source-is-xml
result-is-xml
errors-is-xml
, and
fix-slashes
options
must
be boolean.
The
p:exec
step executes the
command passed on
command
with the
arguments passed on
args
. The processor
does not interpolate the values of the
command
or
args
(for example,
expanding references to environment variables).
It is a
dynamic
error
err:XC0033
) if the command cannot be run.
If
cwd
is specified, then the current
working directory is changed to the value of that option before
execution begins.
It is a
dynamic error
err:XC0034
) if the
current working directory cannot be changed to the value of the
cwd
option.
If
cwd
is not specified, the
current working directory is
implementation-defined
If the
path-separator
option is
specified, every occurrence of the character identified as the
path-separator
character that occurs in the
command
args
, or
cwd
will be replaced by the
platform-specific path separator character.
It is a
dynamic
error
err:XC0063
) if the
path-separator
option is specified and is not exactly
one character long.
The value of the
args
option is a
string. In order to support passing more than one argument to a
command, the
args
string is broken into a
sequence of values. The
arg-separator
option specifies the character that is used to separate values; by
default it is a single space
It is a
dynamic error
err:XC0066
) if the
arg-separator
option is specified and is
not exactly one character long.
The following examples of
p:exec
are equivalent. The first uses the default
arg-separator
The second specifies an alternate separator:
If one of the arguments contains a space (e.g., a filename that
contains a space), then you must specify an alternate
separator.
The
source
port is declared to accept a
sequence so that it can be empty. If no document appears on the
source
port, then the command receives
nothing on standard input. If a document does arrive on the
source
port, it will be passed to the command
as its standard input.
It is a
dynamic error
err:XD0006
) if more
than one document appears on the
source
port
of the
p:exec
step. If
source-is-xml
is true, the serialization options are
used to convert the input into serialized XML which is passed to
the command, otherwise the XPath string-value of the document is
passed.
The standard output of the command is read and returned on
result
; the standard error output is read and
returned on
errors
. In order to assure that
the result will be an XML document, each of the results will be
wrapped in a
c:result
element.
If
result-is-xml
is true, the standard
output of the program is assumed to be XML and will be parsed as a
single document. If it is false, the output is assumed
not
to be XML and will be returned as escaped text.
If
wrap-result-lines
is
true, a
c:line
element will be wrapped around each line of output.
string
It is a
dynamic
error
err:XC0035
) to specify both
result-is-xml
and
wrap-result-lines
The same rules apply to the standard error output of the
program, with the
errors-is-xml
and
wrap-error-lines
options, respectively.
If either of the results are XML, they
must
be parsed with namespaces enabled and
validation turned off, just like
p:document
The
exit-status
port always returns a
single
c:result
element which contains the system
exit status that the process returned.
The specific exit status values returned by a process
invoked with
p:exec
are
implementation-dependent
If a
failure-threshold
value is
supplied, and the exit status is greater than that threshold, then
the
p:exec
step
must
fail.
It is a
dynamic error
err:XC0064
) if the
exit code from the command is greater than the specified
failure-threshold
value. This failure, like any step
failure, can be captured with a
p:try
7.2.2 p:hash
The
p:hash
step generates a hash,
or digital “fingerprint”, for some value and injects it into the
source
document.
="
p:hash
="
source
primary
="
true
/>
="
result
/>
="
parameters
kind
="
parameter
/>
="
value
required
="
true
/>
="
algorithm
required
="
true
/>
="
match
required
="
true
/>
="
version
/>
The value of the
algorithm
option must
be a QName. If it does not have a prefix, then it must be one of
the following values: “crc”, “md”, or “sha”.
If a
version
is not specified,
the default version is algorithm-defined. For “
crc
” it is 32, for “
md
” it
is 5, for “
sha
” it is 1.
A hash is constructed from the string specified in the
value
option using the specified algorithm
and version. Implementations
must
support [
CRC32
],
MD5
], and
SHA1
] hashes.
It is
implementation-defined
what
other algorithms are supported. The resulting hash
should
be returned as a string of hexadecimal
characters.
The value of the
match
option must be an
XSLTMatchPattern.
The hash of the specified value is computed using the algorithm
and parameters specified.
It is a
dynamic error
err:XC0036
) if the
requested hash algorithm is not one that the processor understands
or if the value or parameters are not appropriate for that
algorithm.
The matched nodes are specified with the match pattern in the
match
option. For each matching node, the
string value of the computed hash is used in the output (if more
than one node matches, the
same
hash value is used in each
match). Nodes that do not match are copied without change.
If the expression given in the
match
option matches an
attribute
, the hash is used as the new
value of the attribute in the output. If the attribute is named
xml:base
”, the base URI of the
element
must
also be amended
accordingly.
If the expression matches any other kind of node, the entire
node (and
not
just its contents) is replaced by the
hash.
7.2.3 p:uuid
The
p:uuid
step generates a
UUID
] and
injects it into the
source
document.
="
p:uuid
="
source
primary
="
true
/>
="
result
/>
="
match
required
="
true
/>
="
version
/>
The value of the
match
option must be an
XSLTMatchPattern. The value of the
version
option must be an integer.
If the
version
is specified, that
version of UUID must be computed.
It is a
dynamic error
err:XC0060
) if the
processor does not support the specified
version
of the UUID algorithm.
If the
version
is not
specified, the version of UUID computed is
implementation-defined
Implementations
must
support
version 4 UUIDs.
Support for
other versions of UUID, and the mechanism by which the necessary
inputs are made available for computing other versions, is
implementation-defined
The matched nodes are specified with the match pattern in the
match
option. For each matching node, the
generated UUID is used in the output (if more than one node
matches, the
same
UUID is used in each match). Nodes that
do not match are copied without change.
If the expression given in the
match
option matches an
attribute
, the UUID is used as the new
value of the attribute in the output. If the attribute is named
xml:base
”, the base URI of the
element
must
also be amended
accordingly.
If the expression matches any other kind of node, the entire
node (and
not
just its contents) is replaced by the
UUID.
7.2.4 p:validate-with-relax-ng
The
p:validate-with-relax-ng
step
applies [
RELAX
NG
] validation to the
source
document.
="
p:validate-with-relax-ng
="
source
primary
="
true
/>
="
schema
/>
="
result
/>
="
dtd-attribute-values
select
="
'false'
/>
="
dtd-id-idref-warnings
select
="
'false'
/>
="
assert-valid
select
="
'true'
/>
The values of the
dtd-attribute-values
and
dtd-id-idref-warnings
options
must
be booleans.
If the root element of the schema is
c:data
or has a
c:content-type
attribute that
specifies a text content type or a media type that the
implementation recognizes, then the step
should
treat the text node descendants of the
element as a [
RELAX NG Compact Syntax
] document for
validation.
If the
dtd-attribute-values
option is
true
, then the attribute value defaulting
conventions of [
RELAX NG DTD Compatibility
] are also
applied.
If the
dtd-id-idref-warnings
option is
true
, then the validator
should
treat a schema that is incompatible with
the ID/IDREF/IDREFs feature of [
RELAX NG DTD
Compatibility
] as if the document was invalid.
It is a
dynamic
error
err:XC0053
) if the
assert-valid
option is
true
and the input document is not valid.
The output from this step is a copy of the input, possibly
augmented by application of the [
RELAX NG DTD
Compatibility
]. The output of this step
may
include PSVI annotations.
Support for [
RELAX NG DTD
Compatibility
] is
implementation defined
7.2.5 p:validate-with-schematron
The
p:validate-with-schematron
step
applies [
Schematron
] processing to the
source
document.
="
p:validate-with-schematron
="
parameters
kind
="
parameter
/>
="
source
primary
="
true
/>
="
schema
/>
="
result
primary
="
true
/>
="
report
sequence
="
true
/>
="
phase
select
="
'#ALL'
/>
="
assert-valid
select
="
'true'
/>
It is a
dynamic
error
err:XC0054
) if the
assert-valid
option is
true
and any Schematron assertions fail.
The value of the
phase
option identifies
the Schematron validation phase with which validation begins.
The
parameters
port provides name/value
pairs which correspond to Schematron external variables.
The
result
output from this step is a copy
of the input.
Schematron assertions and reports, if any,
must
appear on the
report
port. The output
should
be in
Schematron Validation Report Language
(SVRL).
The output of this step
may
include
PSVI annotations.
7.2.6 p:validate-with-xml-schema
The
p:validate-with-xml-schema
step
applies [
W3C XML
Schema: Part 1
] validity assessment to the
source
input.
="
p:validate-with-xml-schema
="
source
primary
="
true
/>
="
schema
sequence
="
true
/>
="
result
/>
="
use-location-hints
select
="
'false'
/>
="
try-namespaces
select
="
'false'
/>
="
assert-valid
select
="
'true'
/>
="
mode
select
="
'strict'
/>
The values of the
use-location-hints
try-namespaces
, and
assert-valid
options
must
be boolean.
The value of the
mode
option
must
be an NMTOKEN whose value is
either “
strict
” or “
lax
”.
Validation is performed against the set of schemas represented
by the documents on the
schema
port. These
schemas must be used in preference to any schema locations provided
by schema location hints encountered during schema validation, that
is, schema locations supplied for
xs:import
or
xsi:schema-location
, or determined by
schema-processor-defined namespace-based strategies, for the
namespaces covered by the documents available on the schemas
port.
If
xs:include
elements occur within the
supplied schema documents, they are treated like any other
external documents
It is
implementation-defined
if the
documents supplied on the
schemas
port are
considered when resolving
xs:include
elements
in the schema documents provided.
The
use-location-hints
and
try-namespaces
options allow the pipeline author to
control how the schema processor should attempt to locate schema
documents necessary but not provided on the
schema
port. Any schema documents provided on the
schema
port
must
be used in preference to schema documents located by other
means.
If the
use-location-hints
option is
true
”, the processor
should
make use of schema location hints to locate
schema documents. If the option is “
false
”, the processor
should
ignore any such hints.
If the
try-namespaces
option is
true
”, the processor
should
attempt to dereference the namespace URI to
locate schema documents. If the option is “
false
”, the processor
should
not
dereference namespace URIs.
The
mode
option allow the pipeline
author to control how schema validation begins. The “
strict
” mode means that the document element must be
declared and schema-valid, otherwise it will be treated as invalid.
The “
lax
” mode means that the absence of a
declaration for the document element does not itself count as an
unsuccessful outcome of validation.
It is a
dynamic
error
err:XC0053
) if the
assert-valid
option is
true
and the input document is not valid. If the
assert-valid
option is
false
, it is not an error for the document to be
invalid. In this case, if the implementation does not support the
PSVI,
p:validate-with-xml-schema
is
essentially just an “identity” step, but if the implementation
does
support the PSVI, then the resulting document will
have additional type information (at least for the subtrees that
are valid).
When XML Schema validation assessment is performed, the
processor is invoked in the mode specified by the
mode
option.
It is a
dynamic error
err:XC0055
) if the
implementation does not support the specified mode.
The
result
of the assessment is a document
with the Post-Schema-Validation-Infoset (PSVI) ([
W3C XML Schema: Part
]) annotations, if the pipeline implementation supports
such annotations. If not, the input document is reproduced with any
defaulting of attributes and elements performed as specified by the
XML Schema recommendation.
7.2.7 p:www-form-urldecode
The
p:www-form-urldecode
step
decodes a
x-www-form-urlencoded
string
into a set of parameters.
="
p:www-form-urldecode
="
result
/>
="
value
required
="
true
/>
The
value
option is interpreted as a
string of parameter values encoded using the
x-www-form-urlencoded
algorithm. It turns each such
encoded name/value pair into a parameter. The entire set of
parameters is written (as a
c:param-set
) on the
result
output port.
It is a
dynamic
error
err:XC0037
) if the
value
provided is not a properly
x-www-form-urlencoded
value.
It is a
dynamic
error
err:XC0061
) if the name of any encoded
parameter name is not a valid
xs:NCName
. In
other words, this step can only decode simple name/value pairs
where the names do not contain colons or any characters that cannot
be used in XML names.
The order of the
c:param
elements in the result is the same
as the order of the encoded parameters, reading from left to
right.
If any parameter name occurs more than once in the encoded
string, the resulting parameter set will contain a
c:param
for each
instance. However, only one of these will actually be used if the
parameter set is passed to another step on its
parameter input
port
7.2.8 p:www-form-urlencode
The
p:www-form-urlencode
step
encodes a set of parameter values as a
x-www-form-urlencoded
string and injects it into the
source
document.
="
p:www-form-urlencode
="
source
primary
="
true
/>
="
result
/>
="
parameters
kind
="
parameter
/>
="
match
required
="
true
/>
The value of the
match
option must be an
XSLTMatchPattern.
The set of parameters is encoded as a single
x-www-form-urlencoded
string of name/value pairs.
When parameters are encoded into name/value pairs,
only
the local name of each parameter is used. The namespace name is
ignored and no prefix or colon appears in the name.
The parameters are encoded in document order. That is, the first
parameter appears first in the resulting string, the second
parameter second, etc. reading from left to right.
The matched nodes are specified with the match pattern in the
match
option. For each matching node, the
encoded string is used in the output. Nodes that do not match are
copied without change.
If the expression given in the
match
option matches an
attribute
, the encoded string is used as
the new value of the attribute in the output. If the expression
matches any other kind of node, the entire node (and
not
just its contents) is replaced by the encoded string.
7.2.9 p:xquery
The
p:xquery
step applies an
XQuery 1.0
query to the sequence of documents provided on the
source
port.
="
p:xquery
="
source
sequence
="
true
primary
="
true
/>
="
query
/>
="
parameters
kind
="
parameter
/>
="
result
sequence
="
true
/>
If a sequence of documents is provided on the
source
port, the first document is used as the initial
context item. The whole sequence is also the default collection. If
no documents are provided on the
source
port,
the initial context item is undefined and the default collection is
empty.
The
query
port must receive a single
document:
If the document root element is
c:query
, the text descendants
of this element are considered the query.
string
If the document root element is in the XQueryX namespace, the
document is treated as an XQueryX-encoded query.
Support for XQueryX is
implementation-defined
If the document root element is
c:data
and either does not have a
content-type
attribute or has a
content-type
attribute that specifies a text
content type or a media type that the implementation recognizes,
then the text descendants of this element are considered the
query.
If the document root element is not
c:data
but has a
c:content-type
attribute that specifies a text
content type or a media type that the implementation recognizes,
then the text descendants of this element are considered the
query.
Otherwise, the interpretation
of the query is
implementation-defined
The result of the
p:xquery
step
must be a sequence of documents.
It is a
dynamic error
err:XC0057
) if the
sequence that results from evaluating the XQuery contains items
other than documents and elements. Any elements that appear in the
result sequence will be treated as documents with the element as
their document element.
For example:
declare namespace atom="http://www.w3.org/2005/Atom";
/atom:feed/atom:entry
The output of this step
may
include
PSVI annotations.
The static context of the XQuery processor is augmented in the
following way:
Statically known default collection type
document()*
Statically known namespaces:
Unchanged from the implementation defaults. No namespace
declarations in the XProc pipeline are automatically exposed in the
static context.
The dynamic context of the XQuery processor is augmented in the
following way:
Context item
The first document that appears on the
source
port.
Context position
Context size
Variable values
Any parameters passed on the
parameters
port augment any implementation-defined variable bindings known to
the XQuery processor. The parameter values are passed to the XQuery
processor as values of type
xs:untypedAtomic
Function implementations
The function implementations provided by the XQuery
processor.
Current dateTime
The point in time returned as
the current dateTime is
implementation-defined
Implicit timezone
The implicit timezone is
implementation-defined
Available documents
The set of available
documents (those that may be retrieved with a URI) is
implementation-dependent
Available collections
The set of available
collections is
implementation-dependent
Default collection
The sequence of documents provided on the
source
port.
7.2.9.1 Example
The following pipeline applies XInclude processing and schema
validation before using XQuery:
Example 11. A Sample Pipeline
Document
Where
countp.xq
might contain:
7.2.10 p:xsl-formatter
The
p:xsl-formatter
step receives
an [
XSL 1.1
document and renders the content. The result of rendering is stored
to the URI provided via the
href
option. A
reference to that result is produced on the output port.
="
p:xsl-formatter
="
source
/>
="
parameters
kind
="
parameter
/>
="
result
primary
="
false
/>
="
href
required
="
true
/>
="
content-type
/>
The value of the
href
option
must
be an
anyURI
. If it is relative, it is made absolute against
the base URI of the element on which it is specified (
p:with-option
or
p:xsl-formatter
in the case of a
syntactic shortcut
value).
The content-type of the output is controlled by the
content-type
option. This option specifies a media
type as defined by [
IANA Media Types
]. The option may include media
type parameters as well (e.g. "application/someformat;
charset=UTF-8").
The use of
media type parameters on the
content-type
option is
implementation-defined
If the
content-type
option is not specified, the output type
is
implementation-defined
. The
default
should
be PDF.
A formatter may take any
number of optional rendering parameters via the step's parameters;
such parameters are defined by the XSL implementation used and are
implementation-defined
The output of this step is a document containing a single
c:result
element whose content is the absolute URI of the document stored by
the step.
7.3 Serialization Options
Several steps in this step library require serialization options
to control the serialization of XML. These options are used to
control serialization as in the [
Serialization
specification.
The following options may be present on steps that perform
serialization:
byte-order-mark
The value of this option
must
be a
boolean. If it's not specified, the default varies by encoding: for
UTF-16 it's true, for all others, it's false.
cdata-section-elements
The value of this option
must
be a
list of
QName
s. They are interpreted as
elements name.
doctype-public
The value of this option
must
be a
string. The public identifier of the doctype.
doctype-system
The value of this option
must
be an
anyURI
. The system identifier of the doctype.
It need not be absolute, and is not resolved.
encoding
A character set name.
If no
encoding
is specified, the encoding used is
implementation defined
. If
the
method
is “
xml
” or “
xhtml
”, the
implementation defined encoding
must
be either UTF-8 or UTF-16.
escape-uri-attributes
The value of this option
must
be a
boolean. It is ignored unless the specified method is “
xhtml
” or “
html
”.
include-content-type
The value of this option
must
be a
boolean. It is ignored unless the specified method is “
xhtml
” or “
html
”.
indent
The value of this option
must
be a
boolean.
media-type
The value of this option
must
be a
string. It specifies the media type (MIME content type). If not
specified, the default varies according to the
method
xml
application/xml
html
text/html
xhtml
application/xhtml+xml
text
text/plain
For methods other than
xml
html
xhtml
, and
text
the
media-type
is
implementation
defined
method
The value of this option
must
be a
QName
. It specifies the serialization
method.
normalization-form
The value of this option
must
be an
NMTOKEN, one of the enumerated values
NFC
NFD
NFKC
NFKD
fully-normalized
none
or an implementation-defined value.
omit-xml-declaration
The value of this option
must
be a
boolean.
standalone
The value of this option
must
be an
NMTOKEN, one of the enumerated values
true
false
, or
omit
undeclare-prefixes
The value of this option
must
be a
boolean.
version
The value of this option
must
be a
string.
In order to be consistent with the rest of this specification,
boolean values for the serialization parameters must use one of the
XML Schema lexical forms for boolean: "true", "false", "1", or "0".
This is different from the [
Serialization
specification which uses “yes” and “no”. No change in semantics is
implied by this different spelling.
The
method
option controls the
serialization method used by this component with standard values of
'html', 'xml', 'xhtml', and 'text' but only the 'xml' value is
required to be supported. The interpretation of the remaining
options is as specified in [
Serialization
].
Implementations may support
other method values but their results are
implementation-defined
A minimally conforming implementation must support the
xml
output method with the following option
values:
The
version
must support the value
1.0
The
encoding
must support the values
UTF-8
The
omit-xml-declaration
must be
supported. If the value is not specified or has the value
no
, an XML declaration must be produced.
All other option values may be ignored for the
xml
output method.
If a processor chooses to implement an option for serialization,
it must conform to the semantics defined in the [
Serialization
specification.
Note
The use-character-maps parameter in [
Serialization
specification has not been provided in the standard serialization
options provided by this specification.
A Conformance
Conformant processors
must
implement all of the features described in this specification
except those that are explicitly identified as optional.
Some aspects of processor behavior are not completely specified;
those features are either
implementation-dependent
or
implementation-defined
[Definition: An
implementation-dependent
feature is one where the
implementation has discretion in how it is performed.
Implementations are not required to document or explain how
implementation-dependent
features are performed.]
[Definition: An
implementation-defined
feature is one where the
implementation has discretion in how it is performed. Conformant
implementations
must
document how
implementation-defined
features are performed.]
A.1 Implementation-defined
features
The following features are implementation-defined:
It is implementation-defined what additional step types, if
any, are provided. See
Section 2.1, “Steps”
How inputs are connected to XML documents outside the pipeline
is implementation-defined. See
Section 2.2, “Inputs and
Outputs”
How pipeline outputs are connected to XML documents outside the
pipeline is implementation-defined. See
Section 2.2, “Inputs and
Outputs”
In Version 1.0 of XProc, how (or if) implementers provide local
resolution mechanisms and how (or if) they provide access to
intermediate results by URI is implementation-defined. See
Section 2.2.1,
“External Documents”
Except for cases which are specifically called out in , the
extent to which namespace fixup, and other checks for outputs which
cannot be serialized, are performed on intermediate outputs is
implementation-defined. See
Section 2.4.1, “Namespace Fixup
on Outputs”
If no version is specified on the step or among its ancestors,
then its XPath version is implementation-defined. See
Section 2.6, “XPaths
in XProc”
The version of Unicode supported is implementation-defined, but
it is recommended that the most recent version of Unicode be used.
See
Section 2.6.2.1, “Processor XPath
Context”
The point in time returned as the current dateTime is
implementation-defined. See
Section 2.6.2.1, “Processor
XPath Context”
The implicit timezone is implementation-defined. See
Section 2.6.2.1, “Processor XPath
Context”
The implicit timezone is implementation-defined. See
Section 2.6.2.2, “Step XPath
Context”
The exact format of the language string is
implementation-defined but should be consistent with the xml:lang
attribute. See
Section 2.7.1, “System
Properties”
It is implementation-defined if the processor supports any
other XPath extension functions. See
Section 2.7.10, “Other XPath
Extension Functions”
Whether or not the pipeline processor supports passing PSVI
annotations between steps is implementation-defined. See
Section 2.8, “PSVIs in
XProc”
The exact PSVI properties that are preserved when documents are
passed between steps is implementation-defined. See
Section 2.8, “PSVIs in
XProc”
It is implementation-defined what PSVI properties, if any, are
produced by extension steps. See
Section 2.8, “PSVIs in XProc”
How outside values are specified for pipeline options on the
pipeline initially invoked by the processor is
implementation-defined. See
Section 2.10, “Options”
How outside values are specified for pipeline parameters on the
pipeline initially invoked by the processor is
implementation-defined. See
Section 2.11, “Parameters”
Support for pipeline documents written in XML 1.1 and pipeline
inputs and outputs that use XML 1.1 is implementation-defined. See
Section 3, “Syntax
Overview”
The semantics of p:pipeinfo elements are
implementation-defined. See
Section 3.7, “Processor
annotations”
The set of URI schemes actually supported is
implementation-defined. See
Section 3.11, “Common errors”
The presence of other compound steps is implementation-defined;
XProc provides no standard mechanism for defining them or
describing what they can contain. See
Section 4.8, “Extension Steps”
If the href attribute is not specified, the location of the log
file or files is implementation-defined. See
Section 5.5, “p:log”
How each document or sequence of documents is represented in a
p:log is implementation-defined. See
Section 5.5, “p:log”
The default value of any serialization options not specified on
a particular p:serialization element is implementation-defined. See
Section 5.6, “p:serialization”
When a declared step is evaluated directly by the XProc
processor (as opposed to occurring as an atomic step in some
container), how the input and output ports are connected to
documents is implementation-defined. See
Section 5.8, “p:declare-step”
The subpipeline may include declarations of additional steps
(e.g., other pipelines or other step types that are provided by a
particular implementation or in some implementation-defined way)
and import other pipelines. See
Section 5.8.2, “Declaring
pipelines”
Conformant processors must support directory paths whose scheme
is file. It is implementation-defined what other schemes are
supported by p:directory-list, and what the interpretation of
'directory', 'file' and 'contents' is for those schemes. See
Section 7.1.6, “p:directory-list”
Any file or directory determined to be special by the
p:directory-list step may be output using a c:other element but the
criteria for marking a file as special are implementation-defined.
See
Section 7.1.6, “p:directory-list”
Any attributes other than name on c:file, c:directory, or
c:other are implementation-defined. See
Section 7.1.6,
“p:directory-list”
The interpretation of auth-method values on c:request other
than “Basic” or “Digest” is implementation-defined. See
Section 7.1.10.1,
“Specifying a request”
An implementation may support encodings other than base64 but
these encodings and their names are implementation-defined. See
Section 7.1.10.2, “Request
Entity body conversion”
Pipeline authors that need to preserve cookies across several
p:http-request calls in the same pipeline or across multiple
invocations of the same or different pipelines will have to rely on
implementation-defined mechanisms. See
Section 7.1.10.3.2, “Cookies”
Behavior of p:unescape-markup for content-types other than
application/xml is implementation-defined. See
Section 7.1.26,
“p:unescape-markup”
If cwd is not specified, the current working directory is
implementation-defined. See
Section 7.2.1, “p:exec”
It is implementation-defined what other algorithms are
supported. See
Section 7.2.2,
“p:hash”
If the version is not specified, the version of UUID computed
is implementation-defined. See
Section 7.2.3, “p:uuid”
Support for other versions of UUID, and the mechanism by which
the necessary inputs are made available for computing other
versions, is implementation-defined. See
Section 7.2.3, “p:uuid”
It is implementation-defined if the documents supplied on the
schemas port are considered when resolving xs:include elements in
the schema documents provided. See
Section 7.2.6,
“p:validate-with-xml-schema”
Support for XQueryX is implementation-defined. See
Section 7.2.9,
“p:xquery”
Otherwise, the interpretation of the query is
implementation-defined. See
Section 7.2.9, “p:xquery”
The point in time returned as the current dateTime is
implementation-defined. See
Section 7.2.9, “p:xquery”
The implicit timezone is implementation-defined. See
Section 7.2.9,
“p:xquery”
The use of media type parameters on the content-type option is
implementation-defined. See
Section 7.2.10, “p:xsl-formatter”
If the content-type option is not specified, the output type is
implementation-defined. See
Section 7.2.10, “p:xsl-formatter”
A formatter may take any number of optional rendering
parameters via the step's parameters; such parameters are defined
by the XSL implementation used and are implementation-defined. See
Section 7.2.10, “p:xsl-formatter”
Implementations may support other method values but their
results are implementation-defined. See
Section 7.3, “Serialization
Options”
It is implementation-defined whether additional information
items and properties, particularly those made available in the
PSVI, are preserved between steps. See
Section A.3, “Infoset
Conformance”
A.2 Implementation-dependent
features
The following features are implementation-dependent:
The evaluation order of steps not connected to one another is
implementation-dependent See
Section 2, “Pipeline Concepts”
Outside of a try/catch, the disposition of error messages is
implementation-dependent See
Section 2.2, “Inputs and
Outputs”
Resolving a URI locally may involve resolvers of various sorts
and possibly appeal to implementation-dependent mechanisms such as
catalog files. See
Section 2.2.1, “External
Documents”
Whether (and when and how) or not the intermediate results that
pass between steps are ever written to a filesystem is
implementation-dependent. See
Section 2.2.1, “External
Documents”
The results of computing the union of namespaces in the
presence of conflicting declarations for a particular prefix are
implementation-dependent. See
Section 2.6.1.2, “Step XPath
Context”
The set of available documents (those that may be retrieved
with a URI) is implementation-dependent. See
Section 2.6.2.1, “Processor XPath
Context”
The set of available collections is implementation-dependent.
See
Section 2.6.2.1, “Processor XPath
Context”
The set of available documents (those that may be retrieved
with a URI) is implementation-dependent. See
Section 2.6.2.2, “Step XPath
Context”
Which steps are forbidden, what privileges are needed to access
resources, and under what circumstances these security constraints
apply is implementation-dependent. See
Section 2.12, “Security
Considerations”
The value of the p:episode system property in a use-when
expression is implementation-dependent. See
Section 3.9,
“Conditional Element Exclusion”
The results of testing for steps not in the XProc namespace in
a use-when expression are implementation-dependent. See
Section 3.9,
“Conditional Element Exclusion”
The ability of a step to read the p:log output of some former
step is implementation-dependent. See
Section 5.5, “p:log”
Implementations may use extension attributes to provide
implementation-dependent information about a declared step. See
Section 5.8.1, “Declaring atomic
steps”
If no content type was specified or is associated with the
resource, the inferred content type is implementation-dependent.
See
Section 5.14,
“p:data”
The interpretation of a multipart message inside another
multipart message is implementation-dependent. See
Section 7.1.10.4,
“Converting Response Entity Bodies”
If the IRI reference specified by the base-uri option on
p:make-absolute-uris is not valid, or if it is absent and the input
document has no base URI, the results are implementation-dependent.
See
Section 7.1.15,
“p:make-absolute-uris”
The order in which c:param elements occur in the c:param-set is
implementation-dependent. See
Section 7.1.18, “p:parameters”
How XSLT message termination errors are reported to the XProc
processor is implementation-dependent. See
Section 7.1.31, “p:xslt”
The specific exit status values returned by a process invoked
with p:exec are implementation-dependent. See
Section 7.2.1, “p:exec”
The set of available documents (those that may be retrieved
with a URI) is implementation-dependent. See
Section 7.2.9, “p:xquery”
The set of available collections is implementation-dependent.
See
Section 7.2.9,
“p:xquery”
A.3 Infoset Conformance
This specification conforms to the XML Information Set [
Infoset
]. The
information corresponding to the following information items and
properties must be available to the processor for the documents
that flow through the pipeline.
The
Document Information Item
with
[base URI]
and
[children]
properties.
Element Information Items
with
[base URI]
[children]
[attributes]
[in-scope namespaces]
[prefix]
[local name]
[namespace name]
[parent]
properties.
Attribute Information Items
with
[namespace name]
[prefix]
[local name]
[normalized value]
[attribute type]
, and
[owner
element]
properties.
Character Information Items
with
[character code]
[parent]
, and, optionally,
[element content whitespace]
properties.
Processing Instruction Information
Items
with
[base URI]
[target]
[content]
and
[parent]
properties.
Comment Information Items
with
[content]
and
[parent]
properties.
Namespace Information Items
with
[prefix]
and
[namespace name]
properties.
It is
implementation-defined
whether additional information items and properties, particularly
those made available in the PSVI, are preserved between steps.
B References
B.1 Normative References
XProc Requirements
XML Processing Model Requirements and Use
Cases
. Alex Milowski, editor. W3C Working Draft
11 April 2006.
Infoset
XML Information Set (Second
Edition)
. John Cowan, Richard Tobin, editors. W3C
Working Group Note 04 February 2004.
XML 1.0
Extensible Markup Language (XML) 1.0 (Fifth
Edition)
. Tim Bray, Jean Paoli, C. M.
Sperberg-McQueen, et. al. editors. W3C Recommendation 26 November
2008.
Namespaces 1.0
Namespaces in XML 1.0 (Third
Edition)
. Tim Bray, Dave Hollander, Andrew
Layman, et. al., editors. W3C Recommendation 8 December 2009.
XML 1.1
Extensible Markup Language (XML) 1.1 (Second
Edition)
. Tim Bray, Jean Paoli, C. M.
Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August
2006.
Namespaces 1.1
Namespaces in XML 1.1 (Second
Edition)
. Tim Bray, Dave Hollander, Andrew
Layman, et. al., editors. W3C Recommendation 16 August 2006.
XPath 1.0
XML Path
Language (XPath) Version 1.0
. James Clark and
Steve DeRose, editors. W3C Recommendation. 16 November 1999.
XSLT 1.0
XSL
Transformations (XSLT) Version 1.0
. James Clark,
editor. W3C Recommendation. 16 November 1999.
XPath 2.0
XML
Path Language (XPath) 2.0
. Anders Berglund, Scott
Boag, Don Chamberlin, et. al., editors. W3C Recommendation. 23
January 2007.
XQuery 1.0 and XPath 2.0 Data Model
(XDM)
XQuery 1.0 and XPath 2.0 Data Model
(XDM)
. Mary Fernández, Ashok Malhotra, Jonathan
Marsh,
et. al.
, editors. W3C
Recommendation. 23 January 2007.
XPath 2.0 Functions and Operators
XQuery 1.0 and XPath 2.0 Functions and
Operators
. Ashok Malhotra, Jim Melton, and Norman
Walsh, editors. W3C Recommendation. 23 January 2007.
XSLT 2.0
XSL
Transformations (XSLT) Version 2.0
. Michael Kay,
editor. W3C Recommendation. 23 January 2007.
XSL 1.1
Extensible Stylesheet Language (XSL) Version
1.1
. Anders Berglund, editor. W3C Recommendation.
5 December 2006.
XQuery 1.0
XQuery
1.0: An XML Query Language
. Scott Boag, Don
Chamberlin, Mary Fernández, et. al., editors. W3C Recommendation.
23 January 2007.
RELAX NG
] ISO/IEC JTC 1/SC 34.
ISO/IEC 19757-2:2008(E) Document
Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based
validation -- RELAX NG
2008.
RELAX NG Compact Syntax
] ISO/IEC
JTC 1/SC 34.
ISO/IEC 19757-2:2003/Amd
1:2006 Document Schema Definition Languages (DSDL) — Part 2:
Grammar-based validation — RELAX NG AMENDMENT 1 Compact
Syntax
2006.
RELAX NG DTD Compatibility
RELAX NG DTD
Compatibility
. OASIS Committee Specification. 3
December 2001.
Schematron
] ISO/IEC JTC 1/SC 34.
ISO/IEC 19757-3:2006(E) Document
Schema Definition Languages (DSDL) — Part 3: Rule-based validation
— Schematron
2006.
W3C XML Schema: Part 1
XML Schema Part 1: Structures Second
Edition
. Henry S. Thompson, David Beech, Murray
Maloney, et. al., editors. World Wide Web Consortium, 28 October
2004.
W3C XML Schema: Part 2
XML Schema Part 2: Datatypes Second
Edition
. Paul V. Biron and Ashok Malhotra,
editors. World Wide Web Consortium, 28 October 2004.
xml:id
xml:id
Version 1.0
. Jonathan Marsh, Daniel Veillard, and
Norman Walsh, editors. W3C Recommendation. 9 September 2005.
XInclude
XML
Inclusions (XInclude) Version 1.0 (Second
Edition)
. Jonathan Marsh, David Orchard, and
Daniel Veillard, editors. W3C Recommendation. 15 November 2006.
XML Base
XML
Base (Second Edition)
. Jonathan Marsh and Richard
Tobin, editors. W3C Recommendation. 28 January 2009.
XPointer Framework
XPointer Framework
. Paul
Grosso, Eve Maler, Jonathan Marsh, et. al., editors. W3C
Recommendation. 25 March 2003.
XPointer element() Scheme
XPointer element() Scheme
. Paul
Grosso, Eve Maler, Jonathan Marsh, et. al., editors. W3C
Recommendation. 25 March 2003.
Serialization
XSLT 2.0 and XQuery 1.0
Serialization
. Scott Boag, Michael Kay, Joanne
Tong, Norman Walsh, and Henry Zongaro, editors. W3C Recommendation.
23 January 2007.
MD5
RFC 1321: The MD5 Message-Digest
Algorithm
. R. Rivest. Network Working Group,
IETF, April 1992.
RFC 1521
RFC 1521: MIME (Multipurpose Internet Mail
Extensions) Part One: Mechanisms for Specifying and Describing the
Format of Internet Message Bodies
. N. Borenstein,
N. Freed, editors. Internet Engineering Task Force. September,
1993.
RFC 2119
Key words for use in RFCs to Indicate Requirement
Levels
. S. Bradner. Network Working Group, IETF,
Mar 1997.
RFC 2396
Uniform Resource Identifiers (URI): Generic
Syntax
. T. Berners-Lee, R. Fielding, and L.
Masinter. Network Working Group, IETF, Aug 1998.
RFC 2616
RFC 2616: Hypertext Transfer Protocol —
HTTP/1.1
. R. Fielding, J. Gettys, J. Mogul, et.
al., editors. Internet Engineering Task Force. June, 1999.
RFC 2617
RFC 2617: HTTP Authentication: Basic and Digest
Access Authentication
. J. Franks, P.
Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, L.
Stewart. June, 1999 .
RFC 3023
RFC 3023: XML Media Types
. M.
Murata, S. St. Laurent, and D. Kohn, editors. Internet Engineering
Task Force. January, 2001.
RFC 3548
RFC 3548: The Base16, Base32, and Base64 Data
Encodings
. S. Josefsson, Editor. Internet
Engineering Task Force. July, 2003.
RFC 3986
RFC 3986: Uniform Resource Identifier (URI):
General Syntax
. T. Berners-Lee, R. Fielding, and
L. Masinter, editors. Internet Engineering Task Force. January,
2005.
RFC 3987
RFC 3987: Internationalized Resource Identifiers
(IRIs)
. M. Duerst and M. Suignard, editors.
Internet Engineering Task Force. January, 2005.
Unicode TR#17
Unicode Technical Report #17: Character Encoding
Model
. Ken Whistler, Mark Davis, and Asmus
Freytag, authors. The Unicode Consortium. 11 November 2008.
IANA Media Types
IANA MIME Media Types
. Internet
Engineering Task Force.
HTML Tidy
HTML
Tidy Library Project
. SourceForge project.
TagSoup
TagSoup - Just Keep On
Truckin'
. John Cowan.
UUID
ITU X.667: Information technology - Open Systems
Interconnection - Procedures for the operation of OSI Registration
Authorities: Generation and registration of Universally Unique
Identifiers (UUIDs) and their use as ASN.1 Object Identifier
components
. 2004.
SHA1
Federal Information Processing Standards
Publication 180-1: Secure Hash Standard
1995.
B.2 Informative References
RFC 4122
RFC 4122: A Universally Unique IDentifier (UUID)
URN Namespace
. P. Leach and M. Mealling, editors.
Internet Engineering Task Force. July, 2005.
CRC32
] “32-Bit Cyclic Redundancy
Codes for Internet Applications”,
The
International Conference on Dependable Systems and Networks:
459
10.1109/DSN.2002.1028931
. P. Koopman. June
2002.
C Glossary
Namespaces in XML
Unless otherwise noted, the term
Namespaces in XML
refers equally to [
Namespaces 1.0
] and
Namespaces
1.1
].
XML
XProc is intended to work equally well with [
XML 1.0
] and [
XML 1.1
]. Unless otherwise
noted, the term “
XML
” refers equally to
both versions.
ancestors
The
ancestors
of a step, if it has
any, are its
container
and the ancestors of its
container.
atomic
step
An
atomic step
is a step that
performs a unit of XML processing, such as XInclude or
transformation, and has no internal
subpipeline
bag-merger
The
bag-merger
of two or more bags
(where a bag is an unordered list or, equivalently, something like
a set except that it may contain duplicates) is a bag constructed
by starting with an empty bag and adding each member of each of the
input bags in turn to it. It follows that the cardinality of the
result is the sum of the cardinality of all the input bags.
by
source
A document is specified
by source
if
it references a specific port on another step.
by URI
A document is specified
by URI
if it
is referenced with a URI.
compound
step
compound step
is a step that
contains a
subpipeline
connection
connection
associates an input or
output port with some data source.
contained
steps
The steps that occur directly within, or within non-step
wrappers directly within, a step are called that step's
contained steps
. In other words, “container” and
“contained steps” are inverse relationships.
container
A compound step or multi-container step is a
container
for the steps directly within it or
within non-step wrappers directly within it.
declared
inputs
The input ports declared on a step are its
declared inputs
declared
options
The options declared on a step are its
declared options
declared
outputs
The output ports declared on a step are its
declared outputs
default readable port
The
default readable port
, which may
be undefined, is a specific step name/port name pair from the set
of readable ports.
dynamic
error
dynamic error
is one which occurs
while a pipeline is being evaluated.
empty
environment
The
empty environment
contains no
readable ports, an undefined default readable port and no in-scope
bindings.
empty
sequence
An
empty sequence
of documents is
specified with the
p:empty
element.
environment
The
environment
is a
context-dependent collection of information available within
subpipelines.
extension attribute
An element from the XProc namespace
may
have any attribute not from the XProc
namespace, provided that the expanded-QName of the attribute has a
non-null namespace URI. Such an attribute is called an
extension attribute
in-scope
bindings
The
in-scope bindings
are a set of
name-value pairs, based on
option
and
variable
bindings.
inherited environment
The
inherited environment
of a
contained
step
is an environment that is the same as the environment
of its
container
with the
standard modifications
inline
document
An
inline document
is specified
directly in the body of the element to which it connects.
last
step
The
last step
in a subpipeline is its
last step in document order.
matches
A step
matches
its signature if and
only if it specifies an input for each declared input, it specifies
no inputs that are not declared, it specifies an option for each
option that is declared to be required, and it specifies no options
that are not declared.
multi-container step
multi-container step
is a step that
contains several alternate
subpipelines
namespace
fixup
To produce a serializable
XML
document, the XProc processor must sometimes
add additional namespace nodes, perhaps even renaming prefixes, to
satisfy the constraints of
Namespaces in XML
. This process is
referred to as
namespace fixup
option
An
option
is a name/value pair where
the name is an
expanded name
and the value
must
be a string or
xs:untypedAtomic
parameter
parameter
is a name/value pair
where the name is an
expanded name
and the value
must
be a string or
xs:untypedAtomic
parameter input port
parameter input port
is a
distinguished kind of input port which accepts (only) dynamically
constructed parameter name/value pairs.
pipeline
pipeline
is a set of connected
steps, with outputs of one step flowing into inputs of another.
primary parameter input
port
If a step has a parameter input port which is explicitly marked
primary='true'
”, or if it has exactly one
parameter input port and that port is
not
explicitly
marked “
primary='false'
”, then that parameter
input port is the
primary parameter input
port
of the step.
primary
input port
If a step has a document input port which is explicitly marked
primary='true'
”, or if it has exactly one
document input port and that port is
not
explicitly marked
primary='false'
”, then that input port is
the
primary input port
of the step.
primary
output port
If a step has a document output port which is explicitly marked
primary='true'
”, or if it has exactly one
document output port and that port is
not
explicitly
marked “
primary='false'
”, then that output
port is the
primary output port
of the
step.
readable
ports
The
readable ports
are a set of step
name/port name pairs.
signature
The
signature
of a step is the set of
inputs, outputs, and options that it is declared to accept.
specified
options
The options on a step which have specified values, either
because a
p:with-option
element specifies a value or
because the declaration included a default value, are its
specified options
static
error
static error
is one which can be
detected before pipeline evaluation is even attempted.
step
step
is the basic computational
unit of a pipeline.
step type
exports
The
step type exports
of an XProc
element, against the background of a set of URIs of resources
already visited (call this set
Visited
), are defined by
cases.
subpipeline
Sibling steps (and the connections between them) form a
subpipeline
variable
variable
is a name/value pair where
the name is an
expanded name
and the value
must
be a string or
xs:untypedAtomic
visible
If two names are in the same scope, we say that they are
visible
to each other.
D Pipeline Language Summary
This appendix summarizes the XProc pipeline language. Machine
readable descriptions of this language are available in
RELAX NG
(and the RELAX NG
compact syntax
),
W3C XML Schema
, and
DTD
syntaxes.
NCName
type? =
QName
psvi-required? =
boolean
xpath-version? =
string
exclude-inline-prefixes? =
prefix list
version? =
string
p:input
p:output
p:option
p:log
p:serialization
)*,
p:declare-step
p:pipeline
p:import
)*,
subpipeline
NCName
((
p:iteration-source
? &
p:output
p:log
)*),
subpipeline
NCName
match
XSLTMatchPattern
((
p:viewport-source
? &
p:output
? &
p:log
?),
subpipeline
NCName
p:xpath-context
?,
p:variable
*,
p:when
*,
p:otherwise
?)
p:empty
p:pipe
p:document
p:inline
p:data
XPathExpression
p:xpath-context
?,
p:output
p:log
)*,
subpipeline
((
p:output
p:log
)*,
subpipeline
NCName
((
p:output
p:log
)*,
subpipeline
NCName
p:variable
*,
p:group
p:catch
NCName
((
p:output
p:log
)*,
subpipeline
p:atomic-step
name? =
NCName
p:input
p:with-option
p:with-param
p:log
)*
p:atomic-step
pfx:atomic-step
name? =
NCName
p:input
p:with-option
p:with-param
p:log
)*
pfx:atomic-step
NCName
sequence? =
boolean
primary? =
boolean
kind? = "document"
select? =
XPathExpression
p:empty
p:document
p:inline
p:data
)+)?
NCName
select? =
XPathExpression
p:empty
p:pipe
p:document
p:inline
p:data
)+)?
NCName
sequence? =
boolean
primary? =
boolean
kind
= "parameter">
p:empty
p:document
p:inline
)+)?
XPathExpression
p:empty
p:pipe
p:document
p:inline
p:data
)+)?
p:pipe
p:document
p:inline
p:data
)?
NCName
sequence? =
boolean
primary? =
boolean
/>
NCName
sequence? =
boolean
primary? =
boolean
p:empty
p:pipe
p:document
p:inline
p:data
)+)?
NCName
href? =
anyURI
/>
NCName
byte-order-mark? =
boolean
cdata-section-elements? =
NMTOKENS
doctype-public? =
string
doctype-system? =
string
encoding? =
string
escape-uri-attributes? =
boolean
include-content-type? =
boolean
indent? =
boolean
media-type? =
string
method? =
QName
normalization-form? =
NFC|NFD|NFKC|NFKD|fully-normalized|none|xs:NMTOKEN
omit-xml-declaration? =
boolean
standalone? =
true|false|omit
undeclare-prefixes? =
boolean
version? =
string
/>
QName
select
XPathExpression
((
p:empty
p:pipe
p:document
p:inline
p:data
)?
p:namespaces
*)
QName
required? =
boolean
/>
QName
required? =
boolean
select
XPathExpression
/>
QName
select
XPathExpression
((
p:empty
p:pipe
p:document
p:inline
p:data
)?
p:namespaces
*)
QName
select
XPathExpression
port? =
NCName
((
p:empty
p:pipe
p:document
p:inline
p:data
)?
p:namespaces
*)
QName
element? =
XPathExpression
except-prefixes? =
prefix
list
/>
NCName
type? =
QName
psvi-required? =
boolean
xpath-version? =
string
exclude-inline-prefixes? =
prefix list
version? =
string
p:input
p:output
p:option
p:log
p:serialization
)*,
((
p:declare-step
p:pipeline
p:import
)*,
subpipeline
)?
boolean
xpath-version? =
string
exclude-inline-prefixes? =
prefix list
version? =
string
p:import
p:declare-step
p:pipeline
)*
anyURI
/>
NCName
port
NCName
/>
prefix
list
anyElement
anyURI
/>
anyURI
wrapper? =
QName
wrapper-prefix? =
string
wrapper-namespace? =
string
content-type? =
string
/>
any-well-formed-content
any-well-formed-content
The core steps are also summarized here.
="
p:add-attribute
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
attribute-name
required
="
true
/>
="
attribute-prefix
/>
="
attribute-namespace
/>
="
attribute-value
required
="
true
/>
="
p:add-xml-base
="
source
/>
="
result
/>
="
all
select
="
'false'
/>
="
relative
select
="
'true'
/>
="
p:compare
="
source
primary
="
true
/>
="
alternate
/>
="
result
primary
="
false
/>
="
fail-if-not-equal
select
="
'false'
/>
="
p:count
="
source
sequence
="
true
/>
="
result
/>
="
limit
select
="
/>
="
p:delete
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
p:directory-list
="
result
/>
="
path
required
="
true
/>
="
include-filter
/>
="
exclude-filter
/>
="
p:error
="
source
primary
="
false
/>
="
result
sequence
="
true
/>
="
code
required
="
true
/>
="
code-prefix
/>
="
code-namespace
/>
="
p:escape-markup
="
source
/>
="
result
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
="
p:filter
="
source
/>
="
result
sequence
="
true
/>
="
select
required
="
true
/>
="
p:http-request
="
source
/>
="
result
/>
="
byte-order-mark
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
encoding
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
normalization-form
select
="
'none'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
="
p:identity
="
source
sequence
="
true
/>
="
result
sequence
="
true
/>
="
p:insert
="
source
primary
="
true
/>
="
insertion
sequence
="
true
/>
="
result
/>
="
match
select
="
'/*'
/>
="
position
required
="
true
/>
="
p:label-elements
="
source
/>
="
result
/>
="
attribute
select
="
'xml:id'
/>
="
attribute-prefix
/>
="
attribute-namespace
/>
="
label
select
="
'concat("_",$p:index)'
/>
="
match
select
="
'*'
/>
="
replace
select
="
'true'
/>
="
p:load
="
result
/>
="
href
required
="
true
/>
="
dtd-validate
select
="
'false'
/>
="
p:make-absolute-uris
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
base-uri
/>
="
p:namespace-rename
="
source
/>
="
result
/>
="
from
/>
="
to
/>
="
apply-to
select
="
'all'
/>
="
p:pack
="
source
sequence
="
true
primary
="
true
/>
="
alternate
sequence
="
true
/>
="
result
sequence
="
true
/>
="
wrapper
required
="
true
/>
="
wrapper-prefix
/>
="
wrapper-namespace
/>
="
p:parameters
="
parameters
kind
="
parameter
primary
="
false
/>
="
result
primary
="
false
/>
="
p:rename
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
new-name
required
="
true
/>
="
new-prefix
/>
="
new-namespace
/>
="
p:replace
="
source
primary
="
true
/>
="
replacement
/>
="
result
/>
="
match
required
="
true
/>
="
p:set-attributes
="
source
primary
="
true
/>
="
attributes
/>
="
result
/>
="
match
required
="
true
/>
="
p:sink
="
source
sequence
="
true
/>
="
p:split-sequence
="
source
sequence
="
true
/>
="
matched
sequence
="
true
primary
="
true
/>
="
not-matched
sequence
="
true
/>
="
initial-only
select
="
'false'
/>
="
test
required
="
true
/>
="
p:store
="
source
/>
="
result
primary
="
false
/>
="
href
required
="
true
/>
="
byte-order-mark
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
encoding
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
normalization-form
select
="
'none'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
="
p:string-replace
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
replace
required
="
true
/>
="
p:unescape-markup
="
source
/>
="
result
/>
="
namespace
/>
="
content-type
select
="
'application/xml'
/>
="
encoding
/>
="
charset
/>
="
p:unwrap
="
source
/>
="
result
/>
="
match
required
="
true
/>
="
p:wrap
="
source
/>
="
result
/>
="
wrapper
required
="
true
/>
="
wrapper-prefix
/>
="
wrapper-namespace
/>
="
match
required
="
true
/>
="
group-adjacent
/>
="
p:wrap-sequence
="
source
sequence
="
true
/>
="
result
sequence
="
true
/>
="
wrapper
required
="
true
/>
="
wrapper-prefix
/>
="
wrapper-namespace
/>
="
group-adjacent
/>
="
p:xinclude
="
source
/>
="
result
/>
="
fixup-xml-base
select
="
'false'
/>
="
fixup-xml-lang
select
="
'false'
/>
="
p:xslt
="
source
sequence
="
true
primary
="
true
/>
="
stylesheet
/>
="
parameters
kind
="
parameter
/>
="
result
primary
="
true
/>
="
secondary
sequence
="
true
/>
="
initial-mode
/>
="
template-name
/>
="
output-base-uri
/>
="
version
/>
As are the optional steps.
="
p:exec
="
source
primary
="
true
sequence
="
true
/>
="
result
primary
="
true
/>
="
errors
/>
="
exit-status
/>
="
command
required
="
true
/>
="
args
select
="
''
/>
="
cwd
/>
="
source-is-xml
select
="
'true'
/>
="
result-is-xml
select
="
'true'
/>
="
wrap-result-lines
select
="
'false'
/>
="
errors-is-xml
select
="
'false'
/>
="
wrap-error-lines
select
="
'false'
/>
="
path-separator
/>
="
failure-threshold
/>
="
arg-separator
select
="
/>
="
byte-order-mark
/>
="
cdata-section-elements
select
="
''
/>
="
doctype-public
/>
="
doctype-system
/>
="
encoding
/>
="
escape-uri-attributes
select
="
'false'
/>
="
include-content-type
select
="
'true'
/>
="
indent
select
="
'false'
/>
="
media-type
/>
="
method
select
="
'xml'
/>
="
normalization-form
select
="
'none'
/>
="
omit-xml-declaration
select
="
'true'
/>
="
standalone
select
="
'omit'
/>
="
undeclare-prefixes
/>
="
version
select
="
'1.0'
/>
="
p:hash
="
source
primary
="
true
/>
="
result
/>
="
parameters
kind
="
parameter
/>
="
value
required
="
true
/>
="
algorithm
required
="
true
/>
="
match
required
="
true
/>
="
version
/>
="
p:uuid
="
source
primary
="
true
/>
="
result
/>
="
match
required
="
true
/>
="
version
/>
="
p:validate-with-relax-ng
="
source
primary
="
true
/>
="
schema
/>
="
result
/>
="
dtd-attribute-values
select
="
'false'
/>
="
dtd-id-idref-warnings
select
="
'false'
/>
="
assert-valid
select
="
'true'
/>
="
p:validate-with-schematron
="
parameters
kind
="
parameter
/>
="
source
primary
="
true
/>
="
schema
/>
="
result
primary
="
true
/>
="
report
sequence
="
true
/>
="
phase
select
="
'#ALL'
/>
="
assert-valid
select
="
'true'
/>
="
p:validate-with-xml-schema
="
source
primary
="
true
/>
="
schema
sequence
="
true
/>
="
result
/>
="
use-location-hints
select
="
'false'
/>
="
try-namespaces
select
="
'false'
/>
="
assert-valid
select
="
'true'
/>
="
mode
select
="
'strict'
/>
="
p:www-form-urldecode
="
result
/>
="
value
required
="
true
/>
="
p:www-form-urlencode
="
source
primary
="
true
/>
="
result
/>
="
parameters
kind
="
parameter
/>
="
match
required
="
true
/>
="
p:xquery
="
source
sequence
="
true
primary
="
true
/>
="
query
/>
="
parameters
kind
="
parameter
/>
="
result
sequence
="
true
/>
="
p:xsl-formatter
="
source
/>
="
parameters
kind
="
parameter
/>
="
result
primary
="
false
/>
="
href
required
="
true
/>
="
content-type
/>
And the step vocabulary elements.
QName
namespace? =
anyURI
value
string
/>
c:param
string
charset? =
string
encoding? =
string
string
string
string
c:file
c:directory
c:other
)*
string
/>
string
/>
NCName
href? =
anyURI
detailed? =
boolean
status-only? =
boolean
username? =
string
password? =
string
auth-method? =
string
send-authorization? =
boolean
override-content-type? =
string
c:header
*,
c:multipart
c:body
)?)
string
value
string
/>
string
boundary
string
c:body
string
encoding? =
string
id? =
string
description? =
string
disposition? =
string
anyElement
integer
c:header
*,
c:multipart
c:body
)?)
string
string
E List of Error
Codes
The following error codes are defined by this specification.
E.1 Static Errors
The following
static errors
are defined:
Static Errors
err:XS0001
It is a static error if there are any loops in the connections
between steps: no step can be connected to itself nor can there be
any sequence of connections through other steps that leads back to
itself.
See:
Connections
err:XS0002
All steps in the same scope must have unique names: it is a
static error if two steps with the same name appear in the same
scope.
See:
Scoping of Names
err:XS0003
It is a static error if any declared input is not connected.
See:
Inputs and Outputs
err:XS0004
It is a static error if an option or variable declaration
duplicates the name of any other option or variable in the same
environment.
See:
Scoping of Names
p:option
p:with-option
err:XS0005
It is a static error if the primary output port of any step is
not connected.
See:
Inputs and Outputs
err:XS0006
It is a static error if the primary output port has no explicit
connection and the last step in the subpipeline does not have a
primary output port.
See:
p:for-each
p:viewport
Declaring pipelines
err:XS0007
It is a static error if two subpipelines in a p:choose declare
different outputs.
See:
p:choose
err:XS0008
It is a static error if any element in the XProc namespace has
attributes not defined by this specification unless they are
extension attributes.
See:
Common errors
err:XS0009
It is a static error if the p:group and p:catch subpipelines
declare different outputs.
See:
p:try
err:XS0010
It is a static error if a pipeline contains a step whose
specified inputs, outputs, and options do not match the signature
for steps of that type.
See:
Extension Steps
err:XS0011
It is a static error to identify two ports with the same name on
the same step.
See:
Document Inputs
Parameter Inputs
p:output
err:XS0014
It is a static error to identify more than one output port as
primary.
See:
p:output
err:XS0015
It is a static error if a compound step has no contained
steps.
See:
Common errors
err:XS0017
It is a static error to specify that an option is both required
and has a default value.
See:
p:option
err:XS0018
If an option is required, it is a static error to invoke the
step without specifying a value for that option.
See:
p:option
err:XS0019
it is a static error for a variable's document connection to
refer to the output port of any step in the surrounding container's
contained steps
See:
p:variable
err:XS0020
It is a static error if the binding attribute on p:namespaces is
specified and its value is not the name of an in-scope binding.
See:
Namespaces on variables,
options, and parameters
err:XS0022
In all cases except the p:output of a compound step, it is a
static error if the port identified by a p:pipe is not in the
readable ports of the step that contains the p:pipe.
See:
p:pipe
err:XS0024
It is a static error if the content of the p:inline element does
not consist of exactly one element, optionally preceded and/or
followed by any number of processing instructions, comments or
whitespace characters.
See:
p:inline
err:XS0025
It is a static error if the expanded-QName value of the type
attribute is in no namespace or in the XProc namespace.
See:
p:declare-step
err:XS0026
It is a static error if the port specified on the p:log is not
the name of an output port on the step in which it appears or if
more than one p:log element is applied to the same port.
See:
p:log
err:XS0027
It is a static error if an option is specified with both the
shortcut form and the long form.
See:
Syntactic Shortcut for Option
Values
err:XS0028
It is a static error to declare an option or variable in the
XProc namespace.
See:
p:variable
p:option
err:XS0029
It is a static error to specify a connection for a p:output
inside a p:declare-step for an atomic step.
See:
p:output
err:XS0030
It is a static error to specify that more than one input port is
the primary.
See:
Document Inputs
Parameter Inputs
err:XS0031
It is a static error to use an option on an atomic step that is
not declared on steps of that type.
See:
Syntactic Shortcut for Option
Values
p:with-option
err:XS0032
It is a static error if no connection is provided and the
default readable port is undefined.
See:
Document Inputs
err:XS0033
It is a static error to specify any kind of input other than
“document” or “parameter”.
See:
Parameter Inputs
err:XS0034
It is a static error if the specified port is not a parameter
input port or if no port is specified and the step does not have a
primary parameter input port.
See:
p:with-param
err:XS0035
It is a static error if the declaration of a parameter input
port contains a connection; parameter input port declarations must
be empty.
See:
Parameter Inputs
err:XS0036
All the step types in a pipeline or library must have unique
names: it is a static error if any step type name is built-in
and/or declared or defined more than once in the same scope.
See:
Scoping of Names
Handling Circular and Re-entrant Library
Imports (Non-Normative)
Handling
Circular and Re-entrant Library Imports (Non-Normative)
Handling Circular and Re-entrant
Library Imports (Non-Normative)
err:XS0037
It is a static error if any step directly contains text nodes
that do not consist entirely of whitespace.
See:
Common errors
err:XS0038
It is a static error if any required attribute is not
provided.
See:
Common errors
err:XS0039
It is a static error if the port specified on the
p:serialization is not the name of an output port on the pipeline
in which it appears or if more than one p:serialization element is
applied to the same port.
See:
p:serialization
err:XS0040
It is a static error to specify any value other than true.
See:
Parameter Inputs
err:XS0041
It is a static error to specify both binding and element on the
same p:namespaces element.
See:
Namespaces on variables,
options, and parameters
err:XS0042
It is a static error to attempt to provide a connection for an
input port on the declaration of an atomic step.
See:
Document Inputs
err:XS0044
It is a static error if any element in the XProc namespace or
any step has element children other than those specified for it by
this specification. In particular, the presence of atomic steps for
which there is no visible declaration may raise this error.
See:
Common errors
err:XS0048
It is a static error to use a declared step as a compound
step.
See:
Extension Steps
err:XS0051
It is a static error if the except-prefixes attribute on
p:namespaces does not contain a list of tokens or if any of those
tokens is not a prefix bound to a namespace in the in-scope
namespaces of the p:namespaces element.
See:
Namespaces on variables,
options, and parameters
err:XS0052
It is a static error if the URI of a p:import cannot be
retrieved or if, once retrieved, it does not point to a p:library,
p:declare-step, or p:pipeline.
See:
p:import
err:XS0053
It is a static error to import a single pipeline if that
pipeline does not have a type.
See:
p:import
err:XS0055
It is a static error if a primary parameter input port is
unconnected and the pipeline that contains the step has no primary
parameter input port unless at least one explicit p:with-param is
provided for that port.
See:
Parameter Inputs
err:XS0057
It is a static error if the exclude-inline-prefixes attribute
does not contain a list of tokens or if any of those tokens (except
#all or #default) is not a prefix bound to a namespace in the
in-scope namespaces of the element on which it occurs.
See:
p:inline
err:XS0058
It is a static error if the value #default is used within the
exclude-inline-prefixes attribute and there is no default namespace
in scope.
See:
p:inline
err:XS0059
It is a static error if the pipeline element is not p:pipeline,
p:declare-step, or p:library.
See:
Common errors
err:XS0060
It is a static error if the processor encounters an explicit
request for a previous version of the language and it is unable to
process the pipeline using those semantics.
See:
Backwards-compatible
Mode
err:XS0061
It is a static error if a use-when expression refers to the
context or attempts to refer to any documents or collections.
See:
Conditional Element
Exclusion
err:XS0062
It is a static error if a required version attribute is not
present.
See:
Versioning
Considerations
err:XS0063
It is a static error if the value of the version attribute is
not a xs:decimal.
See:
Versioning
Considerations
E.2 Dynamic Errors
The following
dynamic errors
are defined:
Dynamic Errors
err:XD0001
It is a dynamic error if a non-XML resource is produced on a
step output or arrives on a step input.
See:
Inputs and Outputs
err:XD0003
It is a dynamic error if the viewport source does not provide
exactly one document.
See:
p:viewport
p:viewport-source
err:XD0004
It is a dynamic error if no subpipeline is selected by the
p:choose and no default is provided.
See:
p:choose
err:XD0005
It is a dynamic error if more than one document appears on the
connection for the xpath-context.
See:
p:xpath-context
err:XD0006
If sequence is not specified, or has the value false, then it is
a dynamic error unless exactly one document appears on the declared
port.
See:
Document Inputs
p:exec
err:XD0007
If sequence is not specified on p:output, or has the value
false, then it is a dynamic error if the step does not produce
exactly one document on the declared port.
See:
p:output
err:XD0008
It is a dynamic error if a document sequence appears where a
document to be used as the context node is expected.
See:
Processor XPath Context
Processor XPath Context
p:variable
p:with-option
p:with-param
err:XD0009
It is a dynamic error if the element attribute on p:namespaces
is specified and it does not identify a single element node.
See:
Namespaces on variables,
options, and parameters
err:XD0010
It is a dynamic error if the match expression on p:viewport does
not match an element or document.
See:
p:viewport
err:XD0011
It is a dynamic error if the resource referenced by a p:document
element does not exist, cannot be accessed, or is not a well-formed
XML document.
See:
p:document
err:XD0012
It is a dynamic error if any attempt is made to dereference a
URI where the scheme of the URI reference is not supported.
See:
Common errors
err:XD0013
It is a dynamic error if the specified namespace bindings are
inconsistent; that is, if the same prefix is bound to two different
namespace names.
See:
Namespaces on variables,
options, and parameters
err:XD0014
It is a dynamic error for any unqualified attribute names other
than “name”, “namespace”, or “value” to appear on a c:param
element.
See:
The c:param element
The c:param-set element
err:XD0015
It is a dynamic error if the specified QName cannot be resolved
with the in-scope namespace declarations.
See:
System Properties
err:XD0016
It is a dynamic error if the select expression on a p:input
returns atomic values or anything other than element or document
nodes (or an empty sequence).
See:
Document Inputs
err:XD0017
It is a dynamic error if the running pipeline attempts to invoke
a step which the processor does not know how to perform.
See:
Extension Steps
err:XD0018
It is a dynamic error if the parameter list contains any
elements other than c:param.
See:
The c:param-set element
err:XD0019
It is a dynamic error if any option value does not satisfy the
type required for that option.
See:
Common errors
err:XD0020
It is a dynamic error if the combination of serialization
options specified or defaulted is not allowed.
See:
p:serialization
err:XD0021
It is a dynamic error for a pipeline to attempt to access a
resource for which it has insufficient privileges or perform a step
which is forbidden.
See:
Security Considerations
err:XD0022
It is a dynamic error if a processor that does not support PSVI
annotations attempts to invoke a step which asserts that they are
required.
See:
PSVIs in XProc
err:XD0023
It is a dynamic error if an XPath expression is encountered
which cannot be evaluated (because it is syntactically incorrect,
contains references to unbound variables or unknown functions, or
for any other reason).
See:
XPaths in XProc
err:XD0024
It is a dynamic error if a 2.0 processor encounters an XPath 1.0
expression and it does not support XPath 1.0 compatibility
mode.
See:
XPaths in XProc
err:XD0025
It is a dynamic error if the namespace attribute is specified,
the name contains a colon, and the specified namespace is not the
same as the in-scope namespace binding for the specified
prefix.
See:
The c:param element
err:XD0026
It is a dynamic error if the select expression makes reference
to the context node, size, or position when the context item is
undefined.
See:
p:variable
p:option
p:with-option
p:with-param
err:XD0027
It is a dynamic error if the processor encounters an
xpath-version that it does not support.
See:
XPaths in XProc
err:XD0028
It is a dynamic error if any attribute value does not satisfy
the type required for that attribute.
See:
Common errors
err:XD0029
It is a dynamic error if the document referenced by a p:data
element does not exist, cannot be accessed, or cannot be encoded as
specified.
See:
p:data
err:XD0030
It is a dynamic error if a step is unable or incapable of
performing its function.
See:
Common errors
err:XD0031
It is a dynamic error to use the XProc namespace in the name of
a parameter.
See:
p:with-param
err:XD0033
It is a dynamic error if the name specified is not the name of
an in-scope option or variable.
See:
Value Available
err:XD0034
It is a dynamic error to specify a new namespace or prefix if
the lexical value of the specified name contains a colon (or if no
wrapper is explicitly specified).
See:
p:data
p:add-attribute
p:error
p:label-elements
p:pack
p:rename
p:wrap
p:wrap-sequence
E.3 Step Errors
The following
dynamic errors
can be raised by steps
in this specification:
Step Errors
err:XC0002
It is a dynamic error if the value starts with the string
“--”.
See:
Request Entity body
conversion
err:XC0003
It is a dynamic error if a username or password is specified
without specifying an auth-method, if the requested auth-method
isn't supported, or the authentication challenge contains an
authentication method that isn't supported.
See:
Specifying a request
err:XC0004
It is a dynamic error if the status-only attribute has the value
true and the detailed attribute does not have the value true.
See:
Specifying a request
err:XC0005
It is a dynamic error if the request contains a c:body or
c:multipart but the method does not allow for an entity body being
sent with the request.
See:
Specifying a request
err:XC0006
It is a dynamic error if the method is not specified on a
c:request.
See:
Specifying a request
err:XC0010
It is a dynamic error if an encoding of base64 is specified and
the character set is not specified or if the specified character
set is not supported by the implementation.
See:
p:unescape-markup
err:XC0012
It is a dynamic error if the contents of the directory path are
not available to the step due to access restrictions in the
environment in which the pipeline is run.
See:
p:directory-list
err:XC0013
It is a dynamic error if the pattern matches a processing
instruction and the new name has a non-null namespace.
See:
p:rename
err:XC0014
It is a dynamic error if the XML namespace
(http://www.w3.org/XML/1998/namespace) or the XMLNS namespace
(http://www.w3.org/2000/xmlns/) is the value of either the from
option or the to option.
See:
p:namespace-rename
err:XC0017
It is a dynamic error if the absolute path does not identify a
directory.
See:
p:directory-list
err:XC0019
It is a dynamic error if the documents are not equal, and the
value of the fail-if-not-equal option is true.
See:
p:compare
err:XC0020
It is a dynamic error if the the user specifies a value or
values that are inconsistent with each other or with the
requirements of the step or protocol.
See:
Specifying a request
err:XC0022
it is a dynamic error if the content of the c:body element does
not consist of exactly one element, optionally preceded and/or
followed by any number of processing instructions, comments or
whitespace characters
See:
Request Entity body
conversion
err:XC0023
It is a dynamic error if a select expression or match pattern
returns a node type that is not allowed by the step.
See:
Common errors
p:add-attribute
p:insert
p:label-elements
p:make-absolute-uris
p:rename
p:replace
p:set-attributes
p:unwrap
p:wrap
err:XC0025
It is a dynamic error if the match pattern matches anything
other than an element node and the value of the position option is
“first-child” or “last-child”.
See:
p:insert
err:XC0027
It is a dynamic error if the document is not valid or the step
doesn't support DTD validation.
See:
p:load
err:XC0028
it is a dynamic error if the content of the c:body element does
not consist entirely of characters
See:
Request Entity body
conversion
err:XC0029
It is a dynamic error if an XInclude error occurs during
processing.
See:
p:xinclude
err:XC0030
It is a dynamic error if the override-content-type value cannot
be used (e.g. text/plain to override image/png).
See:
Managing the response
err:XC0033
It is a dynamic error if the command cannot be run.
See:
p:exec
err:XC0034
It is a dynamic error if the current working directory cannot be
changed to the value of the cwd option.
See:
p:exec
err:XC0035
It is a dynamic error to specify both result-is-xml and
wrap-result-lines.
See:
p:exec
err:XC0036
It is a dynamic error if the requested hash algorithm is not one
that the processor understands or if the value or parameters are
not appropriate for that algorithm.
See:
p:hash
err:XC0037
It is a dynamic error if the value provided is not a properly
x-www-form-urlencoded value.
See:
p:www-form-urldecode
err:XC0038
It is a dynamic error if the specified version of XSLT is not
available.
See:
p:xslt
err:XC0039
It is a dynamic error if a sequence of documents (including an
empty sequence) is provided to an XSLT 1.0 step.
See:
p:xslt
err:XC0040
It is a dynamic error if the document element of the document
that arrives on the source port is not c:request.
See:
p:http-request
err:XC0050
It is a dynamic error if the URI scheme is not supported or the
step cannot store to the specified location.
See:
p:store
err:XC0051
It is a dynamic error if the content-type specified is not
supported by the implementation.
See:
p:unescape-markup
err:XC0052
It is a dynamic error if the encoding specified is not supported
by the implementation.
See:
Request Entity body
conversion
p:unescape-markup
err:XC0053
It is a dynamic error if the assert-valid option is true and the
input document is not valid.
See:
p:validate-with-relax-ng
p:validate-with-xml-schema
err:XC0054
It is a dynamic error if the assert-valid option is true and any
Schematron assertions fail.
See:
p:validate-with-schematron
err:XC0055
It is a dynamic error if the implementation does not support the
specified mode.
See:
p:validate-with-xml-schema
err:XC0056
It is a dynamic error if the specified initial mode or named
template cannot be applied to the specified stylesheet.
See:
p:xslt
err:XC0057
It is a dynamic error if the sequence that results from
evaluating the XQuery contains items other than documents and
elements.
See:
p:xquery
err:XC0058
It is a dynamic error if the all and relative options are both
true.
See:
p:add-xml-base
err:XC0059
It is a dynamic error if the QName value in the attribute-name
option uses the prefix “xmlns” or any other prefix that resolves to
the namespace name “http://www.w3.org/2000/xmlns/”.
See:
p:add-attribute
err:XC0060
It is a dynamic error if the processor does not support the
specified version of the UUID algorithm.
See:
p:uuid
err:XC0061
It is a dynamic error if the name of any encoded parameter name
is not a valid xs:NCName.
See:
p:www-form-urldecode
err:XC0062
It is a dynamic error if the match option matches a namespace
node.
See:
p:delete
err:XC0063
It is a dynamic error if the path-separator option is specified
and is not exactly one character long.
See:
p:exec
err:XC0064
It is a dynamic error if the exit code from the command is
greater than the specified failure-threshold value.
See:
p:exec
err:XC0066
It is a dynamic error if the arg-separator option is specified
and is not exactly one character long.
See:
p:exec
F Guidance on Namespace Fixup
(Non-Normative)
An XProc processor may find it necessary to add missing
namespace declarations to ensure that a document can be serialized.
While this process is implementation defined, the purpose of this
appendix is to provide guidance as to what an implementation might
do to either prevent such situations or fix them as before
serialization.
When a namespace binding is generated, the prefix associated
with the QName of the element or attribute in question should be
used. From an Infoset perspective, this is accomplished by setting
the
[prefix]
on the element or attribute.
Then when an implementation needs to add a namespace binding, it
can reuse that prefix if possible. If reusing the prefix is not
possible, the implementation must generate a new prefix that is
unique to the in-scope namespace of the element or owner element of
the attribute.
An implementation can avoid namespace fixup by making sure that
the standard step library does not output documents that require
fixup. The following list contains suggestions as to how to
accomplish this within the steps:
Any step that outputs an element in the step vocabulary
namespace
must
ensure that namespace is declared. An implementation should
generate a namespace binding using the prefix “
”.
When attributes are added by
p:add-attribute
or
p:set-attributes
, the step must ensure the
namespace of the attributes added are declared. If the prefix used
by the QName is not in the in-scope namespaces of the element on
which the attribute was added, the step must add a namespace
declaration of the prefix to the in-scope namespaces. If the prefix
is amongst the in-scope namespace and is not bound to the same
namespace name, a new prefix and namespace binding must be added.
When a new prefix is generated, the prefix associated with the
attribute should be changed to reflect that generated prefix
value.
When an element is renamed by
p:rename
the step must ensure the namespace of the element is declared. If
the prefix used by the QName is not in the in-scope namespaces of
the element being renamed, the step must add a namespace
declaration of the prefix to the in-scope namespaces. If the prefix
is amongst the in-scope namespace and is not bound to the same
namespace name, a new prefix and namespace binding must be added.
When a new prefix is generated, the prefix associated with the
element should be changed to reflect that generated prefix
value.
If the element does not have a namespace name and there is a
default namespace, the default namespace must be undeclared. For
each of the child elements, the original default namespace
declaration must be preserved by adding a default namespace
declaration unless the child element has a different default
namespace.
When an attribute is renamed by
p:rename
, the step must ensure the namespace of the
renamed attribute is declared. If the prefix used by the QName is
not in the in-scope namespaces of the element on which the
attribute was added, the step must add a namespace declaration of
the prefix to the in-scope namespaces. If the prefix is amongst the
in-scope namespace and is not bound to the same namespace name, a
new prefix and namespace binding must be added. When a new prefix
is generated, the prefix associated with the attribute should be
changed to reflect that generated prefix value.
When an element wraps content via
p:wrap
there may be in-scope namespaces coming from ancestor elements of
the new wrapper element. The step must ensure the namespace of the
element is declared properly. By default, the wrapper element will
inherit the in-scope namespaces of the parent element if one
exists. As such, there may be a existing namespace declaration or
default namespace.
If the prefix used by the QName is not in the in-scope
namespaces of the wrapper element, the step must add a namespace
declaration of the prefix to the in-scope namespaces. If the prefix
is amongst the in-scope namespace and is not bound to the same
namespace name, a new prefix and namespace binding must be added.
When a new prefix is generated, the prefix associated with the
wrapper element should be changed to reflect that generated prefix
value.
If the element does not have a namespace name and there is a
default namespace, the default namespace must be undeclared. For
each of the child elements, the original default namespace
declaration must be preserved by adding a default namespace
declaration unless the child element has a different default
namespace.
When the wrapper element is added for
p:wrap-sequence
or
p:pack
, the prefix used by the QName must be added to
the in-scope namespaces.
When a element is removed via
p:unwrap
an in-scope namespaces that are declared on the element must be
copied to any child element except when the child element declares
the same prefix or declares a new default namespace.
In the output from
p:xslt
, if an element
was generated from the xsl:element or an attribute from
xsl:attribute, the step must guarantee that an namespace
declaration exists for the namespace name used. Depending on the
XSLT implementation, the namespace declaration for the namespace
name of the element or attribute may not be declared. It may also
be the case that the original prefix is available. If the original
prefix is available, the step should attempt to re-use that prefix.
Otherwise, it must generate a prefix for a namespace binding and
change the prefix associated the element or attribute.
G Handling Circular and Re-entrant
Library Imports (Non-Normative)
When handling imports, an implementation needs to be able to
detect the following situations, and distinguish them from cases
where multiple import chains produce genuinely conflicting step
definitions:
Circular imports: A imports B, B imports A.
Re-entrant imports: A imports B and C, B imports D, C imports
D.
One way to achieve this is as follows:
[Definition: The
step type exports
of an XProc element,
against the background of a set of URIs of resources already
visited (call this set
Visited
), are defined by
cases.]
The
step type exports
of an
XProc element are as follows:
p:pipeline, p:declare-step
A singleton bag containing the
type
of the
element
p:library
The
bag-merger
of the
step type
exports
of all the element's children
p:import
Let
RU
be the actual resolved URI of the resource
identified by the
href
of the element. If
RU
is a member of
Visited
, then an empty bag,
otherwise update
Visited
by adding
RU
to it, and
return the
step type exports
of the document
element of the retrieved representation
all other elements
An empty bag
The changes to
Visited
mandated by the
p:import
case above are persistent, not scoped. That
is, not only the recursive processing of the imported resource but
also subsequent processing of siblings and ancestors must be
against the background of the updated value. In practice this means
either using a side-effected global variable, or not only passing
Visited
as an argument to any recursive or iterative
processing, but also
returning
its updated value for
subsequent use, along with the bag of step types.
Given a pipeline library document with actual resolved URI
DU
it is a
static error
err:XS0036
) if the
step type
exports
of the document element of the retrieved
representation, against the background of a singleton set
containing
DU
as the initial
Visited
set,
contains any duplicates.
Given a top-level pipeline document with actual resolved URI
DU
it is a
static error
err:XS0036
) if the
bag-merger
of the
step
type exports
of the document element of the retrieved
representation with the
step type exports
of its children,
against the background of a singleton set containing
DU
as
the initial
Visited
set, contains any duplicates.
Given a non-top-level
p:pipeline
or
p:declare-step
element,
it is a
static
error
err:XS0036
) if the
bag-merger
of the
step type
exports
of its parent with the
step type
exports
of its children, against the background of a copy
of the
Visited
set of its parent as the initial
Visited
set, contains any duplicates.
The phrase "a copy of the
Visited
set" in the preceding
paragraph is meant to indicate that checking of non-top-level
p:pipeline
or
p:declare-step
elements does
not
have a
persistent impact on the checking of its parent. The contrast is
that whereas changes to
Visited
pass both up
and
down through
p:import
, they pass only
down
through
p:pipeline
and
p:declare-step
[Definition: The
bag-merger
of two or more bags (where a
bag is an unordered list or, equivalently, something like a set
except that it may contain duplicates) is a bag constructed by
starting with an empty bag and adding each member of each of the
input bags in turn to it. It follows that the cardinality of the
result is the sum of the cardinality of all the input
bags.]
H Sequential
steps, parallelism, and side-effects
XProc imposes as few constraints on the order in which steps
must be evaluated as possible and almost no constraints on parallel
execution.
In the simple, and we believe overwhelmingly common case, inputs
flow into the pipeline, through the pipeline from one step to the
next, and results are produced at the end. The order of the steps
is constrained by the input/output connections between them.
Implementations are free to execute them in a purely sequential
fashion or in parallel, as they see fit. The results are the same
in either case.
This is not true for pipelines which rely on side effects, such
as the state of the filesystem or the state of the web. Consider
the following pipeline:
name="main">
There's no guarantee that “style” step will execute after the
“save-xslt” step. In this case, the solution is straightforward.
Even if you need the saved stylesheet, you don't need to rely on it
in your pipeline:
name="main">
Now the result is independent of the implementation
strategy.
Implementations are free to invent additional control structures
using
p:pipeinfo
and
extension attributes
to provide greater
control over parallelism in their implementations.
I The
application/xproc+xml
media type
This appendix registers a new MIME media type, “
application/xproc+xml
”.
I.1 Registration of MIME media
type application/xproc+xml
MIME media type name:
application
MIME subtype name:
xproc+xml
Required parameters:
None.
Optional parameters:
charset
This parameter has identical semantics to the
charset
parameter of the
application/xml
media type as specified in [
RFC 3023
] or its
successors.
Encoding considerations:
By virtue of XProc content being XML, it has the same
considerations when sent as “
application/xproc+xml
” as does XML. See
RFC 3023
],
Section 3.2.
Security considerations:
Several XProc elements may refer to arbitrary URIs. In this
case, the security issues of [
RFC 2396
], section 7, should be considered.
In addition, because of the extensibility features of XProc, it
is possible that “application/xproc+xml” may describe content that
has security implications beyond those described here. However,
only in the case where the processor recognizes and processes the
additional content, or where further processing of that content is
dispatched to other processors, would security issues potentially
arise. And in that case, they would fall outside the domain of this
registration document.
Interoperability considerations:
This specification describes processing semantics that dictate
behavior that must be followed when dealing with, among other
things, unrecognized elements.
Because XProc is extensible, conformant "application/xproc+xml"
processors can expect that content received is well-formed XML, but
it cannot be guaranteed that the content is valid XProc or that the
processor will recognize all of the elements and attributes in the
document.
Published specification:
This media type registration is for XProc documents as described
by this specification which is located at
Applications which use this media type:
There is no experimental, vendor specific, or personal tree
predecessor to “
application/xproc+xml
”, reflecting the fact that
no applications currently recognize it. This new type is being
registered in order to allow for the deployment of XProc on the
World Wide Web, as a first class XML application.
Additional information:
Magic number(s):
There is no single initial octet sequence that is always present
in XProc documents.
File extension(s):
XProc documents are most often identified with the extension
.xpl
”.
Macintosh File Type Code(s):
TEXT
Person & email address to contact for further
information:
Norman Walsh,
Norman.Walsh@MarkLogic.com
Intended usage:
COMMON
Author/Change controller:
The XProc specification is a work product of the World Wide Web
Consortium's XML Processing Model Working Group. The W3C has change
control over these specifications.
I.2 Fragment
Identifiers
For documents labeled as “
application/xproc+xml
”, the fragment identifier
notation is exactly that for “
application/xml
”, as specified in [
RFC 3023
] or its
successors.