OAI-PMH Implementation Guidelines - Specification and XML Schema for the OAI Identifier Format
Implementation Guidelines for the Open Archives Initiative Protocol for Metadata Harvesting
- Specification and XML Schema for the OAI Identifier Format
Protocol Version 2.0 of 2002-06-14
Document Version 2006/03/09T19:52:00Z
Editors
The OAI Executive:
Carl Lagoze
lagoze@cs.cornell.edu
--
Cornell University - Computer Science
Herbert Van de Sompel
herbertv@lanl.gov
--
Los Alamos National Laboratory - Research
Library
From the OAI Technical Committee:
Michael Nelson
m.l.nelson@larc.nasa.gov
--
NASA - Langley Research Center
Simeon Warner
simeon@cs.cornell.edu
--
Cornell University - Computer Science
This document is one part of the
Implementation Guidelines
that accompany the
Open Archives Initiative Protocol for Metadata Harvesting
(OAI-PMH).
Specification and XML Schema for the OAI Identifier Format
1. Introduction
The OAI identifier format is intended to provide persistent resource
identifiers for items in repositories that implement OAI-PMH. This
is just one possible format that may be used for identifiers within
OAI-PMH.
oai-identifiers
are Uniform Resource Names (URNs) in the sense
of
RFC1737
; they are resource
identifiers and not resource locators (URLs). Note that here the
resource
is the metadata (the items) and not the underlying object or "stuff" that the
metadata describes. Correspondence between an
oai-identifier
and
any identifier that the object described by the metadata may have is outside
the scope of this specification and of the OAI-PMH.
Adherence
to standards and accord with existing schemes
is discussed at the end of
this document.
2. Description
2.1 Syntax
The
oai-identifier
syntax is a restriction of the
"general, absolute URI" syntax:
:
defined in
RFC 2396
The following description uses the same notational conventions as
RFC 2396
and the same definitions of
digit
alpha
alphanum
reserved
unreserved
and
uric
oai-identifier = scheme ":" namespace-identifier ":" local-identifier

scheme = "oai"

namespace-identifier = domainname-word "." domainname
domainname = domainname-word [ "." domainname ]
domainname-word = alpha *( alphanum | "-" )

local-identifier = 1*uric
Any
uric
elements are permitted in the
local-identifier
Since characters in the
reserved
set do not have any
special meaning in the
local-identifier
component, they
are permitted unescaped. All characters not included
in the
unreserved
and
reserved
sets
must
be
escaped
(using the same
encoding
as OAI-PMH requests).
Characters in the
unreserved
and
reserved
sets
must not
be escaped.
An
oai-identifier
should never be unescaped, the sole
purpose of permitting
escaped
characters is to allow
repositories to map any internal identifier to the
local-identifier
part of an
oai-identifier
The following definitions are copied from
RFC 2396
for convenience:
uric = reserved | unreserved | escaped
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
To avoid the possibility of inconsistently generated
escaped
characters in an
oai-identifier
, the
hex
digits must use uppercase for the letters
though
This is a further restriction on RFC 2396. Thus,
escaped
and
hex
are defined as follows:
escaped = "%" hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F"
2.2 Namespace Identifier
Organizations must choose
namespace-identifier
values
which correspond to a domain-name that they have registered, and are
committed to maintaining. Note that since the
oai-identifier
is case-sensitive, a particular capitalization style must be selected and
used consistently. A single domain name should not be used with variant
capitalizations.
Domain name registration is used to avoid the need for any additional
registration service for
oai-identifiers
. Domain name
based identifiers guarantee global uniqueness without the need for
OAI registration as required with the earlier, v1.0/1.1 specification.
2.3 Equivalence
Two
oai-identifiers
are equivalent if they are identical
strings. All three parts of the
oai-identifier
are case
sensitive. Any
escaped
elements must be left escaped;
there is no ambiguity because it is permissible (and required) only
to escape characters than cannot be included directly.
2.4 Backwards Compatibility
An
oai-identifier
scheme was introduced in
OAI-PMH v1.0
and remained unchanged in
OAI-PMH v1.1
This scheme has been widely adopted and existing identifiers may
continue to be used by referring to the old schema:
To use this new
oai-identifier
scheme, repositories must
make the following changes:
Change the
Identify
response to refer to the new schema.
Choose and adopt a new domain name based
namespace-identifier
to replace the
repository-identifier
A single
namespace-identifier
may be used
for identifiers in multiple repositories operated by the same organization.
The same
oai-identifier
description block
would then be used in the responses to Identify requests for each repository.
Uniqueness of the
namespace-identifier
is guaranteed through
domain name registration and not through registration with the
OAI validation service
as it was with v1.0/1.1.
Ensure that the
local-identifier
components of any identifiers
exposed use the restricted character set (
uric
) of this specification.
This may mean that internal identifiers need to be escaped to create the
local-identifier
component. The characters
and # were used with the earlier
oai-identifier
scheme and
may no longer be used in the
local-identifier
component.
2.5 Use as Arguments in OAI-PMH Requests
When used as an argument in an OAI-PMH request, an
oai-identifier
must be correctly encoded. This means that the colon (
separators and the percent (
) characters of
escaped
characters in the
local-identifier
part must be
URL encoded
For example, the
oai-identifier
oai:an.oai.org:ab%3Ccd
would be encoded as
identifier=oai%3Aan.oai.org%3Aab%253Ccd
in an OAI-PMH request.
This means that characters in some internal identifier that an
oai-identifier
is derived from may be URL encoded twice
-- once to make the
oai-identifier
, and a second time
to express the
oai-identifier
in a URL. The URL will be decoded
once to recover the
oai-identifier
2.6 Examples
The following are valid
oai-identifier
identifiers:
oai:arXiv.org:hep-th/9901001

oai:foo.org:some-local-id-53
oai:FOO.ORG:some-local-id-53 ;not the same as above,
;should not use foo.org _and_ FOO.ORG

oai:foo.org:some-local-id-54
oai:foo.org:Some-Local-Id-54 ;not the same as above, distinct identifier

oai:wibble.org:ab%20cd ;space in internal id correctly escaped
oai:wibble.org:ab?cd ;question mark should not be escaped
The following are
not
valid
oai-identifier
identifiers:
something:arXiv.org:hep-th/9901001 ;bad scheme

oai:999:abc123 ;namespace-identifier must not start with digit
oai:wibble:abc123 ;namespace-identifier must be domain name

oai:wibble.org:ab cd ;space not permitted (must be escaped as %20)
oai:wibble.org:ab#cd ;# not permitted
oai:wibble.org:aboai:wibble.org:ab%3ccd ;< must be escaped at %3C not %3c
3. XML Schema for
description
container
The following XML schema
oai-identifier.xsd
defines the format of a
description
container in the
Identify
response so that repositories may expose their compliance
with the the
oai-identifier
format.
The value of the
repositoryIdentifier
element
is the
namespace-identifier
, which is not bound to a single
repository. The element name was kept to maintain continuity with v1.0/1.1
of this specification.
description
for repositories that share the OAI format for unique identifiers of records
xmlns:oai-identifier="http://www.openarchives.org/OAI/2.0/oai-identifier"
xmlns="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified">



Schema for description section of Identify reply of OAI-PMH v2.0.
For repositories that comply with the oai format for unique identifiers
for items records.
See: http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
Validated with http://www.w3.org/2001/03/webdata/xsv on 16May2002
Simeon Warner $Date: 2002/06/21 20:14:34 $



type="string" fixed="oai"/>
type="oai-identifier:repositoryIdentifierType"/>
type="string" fixed=":"/>
type="oai-identifier:sampleIdentifierType"/>







value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"/>




This Schema is available at
3.1 Examples
The following examples are excerpts from
Identify
responses which may contain
zero or more

containers.

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
oai
bespa.org
:
oai:bespa.org:medi99-123



xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
oai
oai-stuff.foo.org
:
oai:oai-stuff.foo.org:5324


4. Adherence to standards and accord with existing schemes
The following two sections describe how the
oai-identifier
meets the requirements for URN schemes outlined in
RFC1737
4.1 Functional requirements
Global scope:
oai-identifiers
should have global scope in the sense
that two equivalent
oai-identifiers
should have the same meaning
everywhere (i.e. they identify the same metadata item).
Global uniqueness: the same
oai-identifier
should never be assigned to
different metadata items. To be useful for dedupping, the same metadata item
should not have more than one
oai-identifier
. Note that this does not imply
that there will not be more than one metadata item (and hence
oai-identifier
that describe the same underlying resource.
Persistence: it is intended that
oai-identifiers
will be permanent.
That is,
oai-identifiers
must remain globally unique and items should
retain the same
oai-identifier
(This is considerably weaker than RFC1737.)
Scalability: availability of
oai-identifiers
should not be
limited by the syntax. Separation into two parts:
namespace-identifier
and a
local-identifier
assures scalability in the same way as other URI schemes.
Legacy support: this revision of
oai-identifiers
does
not accommodate existing
oai-identifiers
created
for use with OAI-PMH versions 1.0 and 1.1. Repositories wishing
to use that scheme may still do so,
see "
Backwards compatibility
".
Extensibility: the
oai-identifier
scheme is designed
around a model of
namespace-identifier
and
local-identifier
. While the syntax of
local-identifier
is undefined and may be used for some
possible extensions, the rest of the syntax is not. A more complex
scheme could be supported by extension of the
namespace-identifier
syntax or by the creation of a
new URI scheme (OAI-PMH allows arbitrary URIs as identifiers).
(This is considerably weaker than RFC1737.)
Resolution:
oai-identifiers
are intended to serve as
identifiers for metadata items within repositories. It is not intended
that
oai-identifiers
be used outside the context of a set
of interacting repositories and harvesters.
With knowledge of the repository that an
oai-identifier
was obtained from, it will be possible to obtain the status of the
item and to disseminate metadata from it (provided the OAI-PMH
interface is operational).
No general resolution scheme is proposed or imagined. Any such scheme
would involve an additional registration database.
(This is considerably weaker than RFC1737.)
4.2 Encoding requirements
oai-identifiers
are not designed for human use, they are
designed to be used only with the OAI-PMH. As such, presentation in
text, electronic mail etc. is not important. This makes the encoding
requirements considerably simpler than those described in
RFC1737
Single encoding: there should be just one way to write an
oai-identifier
Simple comparison: there should be a trivial and local algorithm to
compare two
oai-identifiers
Transport friendliness:
oai-identifiers
should be able to be
transported unmodified over common Internet protocols (e.g. HTTP) and using
common encoding standards (e.g. XML, RDF).
Machine consumption:
oai-identifiers
should be easy to parse.
Ease of use:
oai-identifiers
should be short so that
transmitting them and managing them within computer programs is convenient.
Acknowledgements
Support for the development of the OAI-PMH and for other Open Archives
Initiative activities comes from the
Digital Library Federation
, the
Coalition for Networked Information
, and
from the National Science Foundation through Grant No. IIS-9817416. Individuals
who have played a significant role in the development of OAI-PMH version 2.0 are
acknowledged
in the protocol document
Document History
2006-03-09
: Added clarification that
repositoryIdentifier
is the
container for the
namespace-identifier
and is not bound to a particular
repository.
2002-06-21
: Added type definitions to
scheme
and
delimiter
elements in schema.
2002-06-14
: Release of this document, combined with the release of OAI-PMH
version 2.0.
This work is licensed under a
Creative Commons License