RDF 1.1 Turtle
The Resource Description Framework
RDF
) is a
general-purpose language for representing information in the Web.
This document defines a textual syntax for RDF called Turtle
that allows an RDF graph to be completely written in a compact and
natural text form, with abbreviations for common usage patterns and
datatypes. Turtle provides levels of compatibility with the
N-Triples [[N-TRIPLES]]
format as well as the triple pattern syntax of the
SPARQL
W3C Recommendation.
This document is a part of the RDF 1.1 document suite. The
document defines Turtle, the Terse RDF Triple Language, a concrete
syntax for RDF [[RDF11-CONCEPTS]].
Introduction
This document defines Turtle, the Terse RDF
Triple Language, a concrete syntax for
RDF [[!RDF11-CONCEPTS]].
A Turtle document is a textual representations of an RDF graph. The following Turtle document describes the relationship between Green Goblin and Spiderman.
This example introduces many of features of the Turtle language:
@base and Relative IRIs
@prefix and prefixed names
predicate lists
separated by '
',
object lists
separated by '
',
the token
and
literals
The Turtle grammar for
triples
is a subset of the
SPARQL
1.1 Query Language
[[SPARQL11-QUERY]] grammar for
TriplesBlock
The two grammars share production and terminal names where possible.
The construction of an RDF graph from a Turtle document is defined in
Turtle Grammar
and
Parsing
Turtle Language
A Turtle document allows writing down an RDF graph in a compact textual form. An RDF graph is made up of
triples
consisting of a subject, predicate and object.
Comments may be given after a '
' that is not part of another lexical token and continue to the end of the line.
Simple Triples
The simplest triple statement is a sequence of (subject, predicate, object) terms, separated by whitespace and terminated by '
' after each triple.
Predicate Lists
Often the same subject will be referenced by a number of predicates. The
predicateObjectList production
matches a series of predicates and objects, separated by '
', following a subject.
This expresses a series of RDF Triples with that subject and each predicate and object allocated to one triple.
Thus, the '
' symbol is used to repeat the subject of triples that vary only in predicate and object RDF terms.
These two examples are equivalent ways of writing the triples about Spiderman.
Object Lists
As with predicates often objects are repeated with the same subject and predicate. The
objectList production
matches a series of objects separated by '
' following a predicate.
This expresses a series of RDF Triples with the corresponding subject and predicate and each object allocated to one triple.
Thus, the '
' symbol is used to repeat the subject and predicate of triples that only differ in the object RDF term.
These two examples are equivalent ways of writing Spiderman's name in two languages.
There are three types of
RDF Term
defined in RDF Concepts:
IRIs
(Internationalized Resource Identifiers),
literals
and
blank nodes
. Turtle provides a number
of ways of writing each.
IRIs
IRIs
may be written as relative or absolute IRIs or prefixed names.
Relative and absolute IRIs are enclosed in '<' and '>' and may contain
numeric escape sequences
(described below). For example
Relative IRIs like
<#green-goblin>
are resolved relative to the current base IRI. A new base IRI can be defined using the '
@base
' or '
BASE
' directive. Specifics of this operation are defined in
The token '
' in the predicate position of a Turtle triple represents the IRI
prefixed name
is a prefix label and a local part, separated by a colon ":".
A prefixed name is turned into an IRI by concatenating the IRI associated with the prefix and the local part. The '
@prefix
' or '
PREFIX
' directive associates a prefix label with an IRI.
Subsequent '
@prefix
' or '
PREFIX
' directives may re-map the same prefix label.
The Turtle language originally permitted only the syntax including the '
' character for writing prefix and base directives.
The case-insensitive '
PREFIX
' and '
BASE
' forms were added to align Turtle's syntax with that of SPARQL.
It is advisable to serialize RDF using the '
@prefix
' and '
@base
' forms until RDF 1.1 Turtle parsers are widely deployed.
To write
using a prefixed name:
Define a prefix label for the vocabulary IRI
as
somePrefix
Then write
somePrefix:enemyOf
which is equivalent to writing
This can be written using either the original Turtle syntax for prefix declarations:
or SPARQL's syntax for prefix declarations:
Prefixed names are a superset of XML QNames.
They differ in that the local part of prefixed names may include:
leading digits, e.g.
leg:3032571
or
isbn13:9780136019701
non leading colons, e.g.
og:video:height
reserved character escape sequences
, e.g.
wgs:lat\-long
The following Turtle document contains examples of all the different ways of writing IRIs in Turtle.
The '
@prefix
' and '
@base
' directives require a trailing '
' after the IRI, the equalivent '
PREFIX
' and '
BASE
' must not have a trailing '
' after the IRI part of the directive.
RDF Literals
Literals
are used to identify values such as strings, numbers, dates.
Quoted Literals
Quoted Literals (Grammar production
RDFLiteral
) have a lexical form followed by a language tag, a datatype IRI, or neither.
The representation of the lexical form consists of an initial delimiter, e.g.
U+0022
), a sequence of permitted characters or
numeric escape sequence
or
string escape sequence
, and a final delimiter.
The corresponding
RDF lexical form
is the characters between the delimiters, after processing any escape sequences.
If present, the
language tag
is preceded by a '
' (
U+0040
).
If there is no language tag, there may be a
datatype IRI
, preceeded by '
^^
' (
U+005E
U+005E
). The datatype IRI in Turtle may be written using either an
absolute IRI
, a
relative IRI
, or
prefixed name
. If there is no datatype IRI and no language tag, the datatype is
xsd:string
' (
U+005C
) may not appear in any quoted literal except as part of an escape sequence. Other restrictions depend on the delimiter:
Literals delimited by
U+0027
), may not contain the characters
LF
U+000A
), or
CR
U+000D
).
Literals delimited by
, may not contain the characters
LF
, or
CR
Literals delimited by
'''
may not contain the sequence of characters
'''
Literals delimited by
"""
may not contain the sequence of characters
"""
Numbers
Numbers can be written like other literals with lexical form and datatype (e.g.
"-5.0"^^xsd:decimal
). Turtle has a shorthand syntax for writing integer values, arbitrary precision decimal values, and double precision floating point values.
Data Type
Abbreviated
Lexical
Description
xsd:integer
-5
"-5"^^xsd:integer
Integer values may be written as an optional sign and a series of digits. Integers match the regular expression "
[+-]?[0-9]+
".
xsd:decimal
-5.0
"-5.0"^^xsd:decimal
Arbitrary-precision decimals may be written as an optional sign, zero or more digits, a decimal point and one or more digits. Decimals match the regular expression "
[+-]?[0-9]*\.[0-9]+
".
xsd:double
4.2E9
"4.2E9"^^xsd:double
Double-precision floating point values may be written as an optionally signed mantissa with an optional decimal point, the letter "e" or "E", and an optionally signed integer exponent. The exponent matches the regular expression "
[+-]?[0-9]+
" and the mantissa one of these regular expressions: "
[+-]?[0-9]+\.[0-9]+
", "
[+-]?\.[0-9]+
" or "
[+-]?[0-9]
".
Booleans
Boolean values may be written as either '
true
' or '
false
' (case-sensitive) and represent RDF literals with the datatype
xsd:boolean
RDF Blank Nodes
RDF blank nodes
in Turtle are expressed as
_:
followed by a blank node label which is a series of name characters.
The characters in the label are built upon
PN_CHARS_BASE
, liberalized as follows:
The characters
and digits may appear anywhere in a blank node label.
The character
may appear anywhere except the first or last character.
The characters
U+00B7
U+0300
to
U+036F
and
U+203F
to
U+2040
are permitted anywhere except the first character.
A fresh RDF blank node is allocated for each unique blank node label in a document.
Repeated use of the same blank node label identifies the same RDF blank node.
Nesting Unlabeled Blank Nodes in Turtle
In Turtle, fresh RDF blank nodes are also allocated when matching the production
blankNodePropertyList
and the terminal
ANON
Both of these may appear in the
subject
or
object
position of a triple (see the Turtle Grammar).
That subject or object is a fresh RDF blank node.
This blank node also serves as the subject of the triples produced by matching the
predicateObjectList
production embedded in a blankNodePropertyList.
The generation of these triples is described in
Predicate Lists
Blank nodes are also allocated for
collections
described below.
The Turtle grammar allows
blankNodePropertyList
s to be nested.
In this case, each inner
establishes a new subject blank node which reverts to the outer node at the
, and serves as the current subject for
predicate object lists
The use of
predicateObjectList
within a
blankNodePropertyList
is a common idiom for representing a series of properties of a node.
Abbreviated:
Corresponding simple triples:
Collections
RDF provides a
Collection
[[RDF11-MT]] structure for lists of RDF nodes.
The Turtle syntax for Collections is a possibly empty list of RDF terms enclosed by
()
This collection represents an
rdf:first
rdf:rest
list structure with the sequence of objects of the
rdf:first
statements being the order of the terms enclosed by
()
The
(…)
syntax MUST appear in the
subject
or
object
position of a triple (see the Turtle Grammar).
The blank node at the head of the list is the subject or object of the containing triple.
Examples
This example is a Turtle translation of
example 7
in the
RDF/XML Syntax specification
example1.ttl
):
An example of an RDF collection of two literals.
which is short for (
example2.ttl
):
An example of two identical triples containing literal objects
containing newlines, written in plain and long literal forms.
The line breaks in this example are LINE FEED characters (U+000A).
example3.ttl
):
As indicated by the grammar, a
collection
can be either a
subject
or an
object
. This subject or object will be the novel blank node for the first object, if the collection has one or more objects, or
rdf:nil
if the collection is empty.
For example,
is syntactic sugar for (noting that the blank nodes
b0
b1
and
b2
do not occur anywhere else in the RDF graph):
RDF collections can be nested and can involve other syntactic forms:
is syntactic sugar for:
Turtle compared to SPARQL
The
SPARQL 1.1
Query LanguageF
SPARQL
) [[SPARQL11-QUERY]] uses a Turtle style syntax for its
TriplesBlock production
This production differs from the Turtle language in that:
SPARQL permits RDF Literals
as the subject of RDF triples.
SPARQL permits variables (
name
or
name
) in any part of the triple of the form.
Turtle allows
prefix and base declarations
anywhere outside of a triple. In SPARQL, they are only allowed in the
Prologue
(at the start of the SPARQL query).
SPARQL uses case insensitive keywords, except for '
'. Turtle's
@prefix
and
@base
declarations are case sensitive, the SPARQL dervied
PREFIX
and
BASE
are case insensitive.
true
' and '
false
' are case insensitive in SPARQL and case sensitive in Turtle.
TrUe
is not a valid boolean value in Turtle.
For further information see the
Syntax for IRIs
and
SPARQL Grammar
sections of the SPARQL query document [[SPARQL11-QUERY]].
This specification defines conformance criteria for:
Turtle documents
Turtle parsers
A conforming
Turtle document
is a Unicode string that conforms to the grammar and additional constraints defined in
, starting with the
turtleDoc
production
. A Turtle document serializes an RDF Graph.
A conforming
Turtle parser
is a system capable of reading Turtle documents on behalf of an application. It makes the serialized RDF dataset, as defined in
, available to the application, usually through some form of API.
The IRI that identifies the Turtle language is:
This specification does not define how Turtle parsers handle non-conforming input documents.
Media Type and Content Encoding
The media type of Turtle is
text/turtle
The content encoding of Turtle content is always UTF-8. Charset
parameters on the mime type are required until such time as the
text/
media type tree permits UTF-8 to be sent without a
charset parameter. See
for the media type
registration form.
Turtle Grammar
A Turtle document is a
Unicode[[!UNICODE]]
character string encoded in UTF-8.
Unicode characters only in the range U+0000 to U+10FFFF inclusive are
allowed.
White Space
White space (production
WS
) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a Turtle parser.
White space is significant in the production
String
Comments
Comments in Turtle take the form of '#', outside an
IRIREF
or
String
and continue to the end of line (marked by characters U+000D or U+000A)
or end of file if there is no end of line after the comment
marker. Comments are treated as white space.
IRI References
Relative IRIs are resolved with base IRIs as per
Uniform Resource Identifier (URI): Generic Syntax
[[RFC3986]] using only the basic algorithm in section 5.2.
Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed.
Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of
Internationalized Resource Identifiers (IRIs)
[[RFC3987]].
The
@base
or
BASE
directive defines the Base IRI used to resolve relative IRIs per RFC3986 section 5.1.1, "Base URI Embedded in Content".
Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the In-Scope Base IRI may come from an encapsulating document, such as a SOAP envelope with an xml:base directive or a mime multipart document with a Content-Location header.
The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular Turtle document was retrieved.
If none of the above specifies the Base URI, the default Base URI (section 5.1.4, "Default Base URI") is used.
Each
@base
or
BASE
directive sets a new In-Scope Base URI, relative to the previous one.
Escape Sequences
There are three forms of escapes used in turtle documents:
numeric escape sequences
represent Unicode code points:
Escape sequence
Unicode code point
'\u'
hex
hex
hex
hex
A Unicode character in the range U+0000 to U+FFFF inclusive
corresponding to the value encoded by the four hexadecimal digits interpreted from most significant to least significant digit.
'\U'
hex
hex
hex
hex
hex
hex
hex
hex
A Unicode character in the range U+0000 to U+10FFFF inclusive
corresponding to the value encoded by the eight hexadecimal digits interpreted from most significant to least significant digit.
where
HEX
is a hexadecimal character
HEX
::= [0-9] | [A-F] | [a-f]
string escape sequences
represent the characters traditionally escaped in string literals:
Escape sequence
Unicode code point
'\t'
U+0009
'\b'
U+0008
'\n'
U+000A
'\r'
U+000D
'\f'
U+000C
'\"'
U+0022
'\''
U+0027
'\\'
U+005C
reserved character escape sequences
consist of a '\' followed by one of
~.-!$&'()*+,;=/?#@%_
and represent the character to the right of the '\'.
Context where each kind of escape sequence can be used
numeric
escapes
string
escapes
reserved character
escapes
IRI
s, used as
RDF terms
or as in
@prefix
PREFIX
@base
, or
BASE
declarations
yes
no
no
local name
no
no
yes
String
yes
yes
no
%-encoded sequences are in the
character range for IRIs
and are
explicitly allowed
in local names. These appear as a '%' followed by two hex characters and represent that same sequence of three characters. These sequences are
not
decoded during processing. A term written as
in Turtle designates the IRI
and not IRI
. A term written as
ex:%66oo-bar
with a prefix
@prefix ex:
also designates the IRI
Grammar
The
EBNF
used here is defined in XML 1.0
[[!EBNF-NOTATION]]. Production labels consisting of a
number and a final 's', e.g. [
60s
], reference the production
with that number in the
SPARQL
1.1 Query Language grammar
[[SPARQL11-QUERY]].
Notes:
Keywords in single quotes ('
@base
', '
@prefix
', '
', '
true
', '
false
') are case-sensitive.
Keywords in double quotes ("
BASE
", "
PREFIX
") are case-insensitive.
Escape sequences
UCHAR
and
ECHAR
are case sensitive.
When tokenizing the input and choosing grammar rules, the longest match is chosen.
The Turtle grammar is LL(1) and LALR(1) when the rules with uppercased names are used as terminals.
The entry point into the grammar is
turtleDoc
In signed numbers, no white space is allowed between the sign and the number.
The
[162s]
ANON
::=
WS
token allows any amount of white space and comments between
[]
s.
The single space version is used in the grammar for clarity.
The strings '
@prefix
' and '
@base
' match the pattern for
LANGTAG
, though neither "
prefix
" nor "
base
" are
registered language subtags
This specification does not define whether a quoted literal followed by either of these tokens (e.g.
"A"@base
) is in the Turtle language.
Parsing
The RDF 1.1 Concepts and Abstract Syntax specification [[!RDF11-CONCEPTS]] defines three types of
RDF Term
IRIs
literals
and
blank nodes
Literals are composed of a
lexical form
and an optional
language tag
[[!BCP47]] or datatype IRI.
An extra type,
prefix
, is used during parsing to map string identifiers to namespace IRIs.
This section maps a string conforming to the grammar in
to a set of triples by mapping strings matching productions and lexical tokens to RDF terms or their components (e.g. language tags, lexical forms of literals). Grammar productions change the parser state and emit triples.
Parser State
Parsing Turtle requires a state of five items:
IRI
baseURI
— When the
base
production
is reached, the second rule argument,
IRIREF
, is the base URI used for relative
IRI resolution.
Map[
prefix
-> IRI]
namespaces
— The second and third
rule arguments (
PNAME_NS
and
IRIREF
) in the
prefixID
production
assign a namespace name
IRIREF
) for the prefix
PNAME_NS
). Outside of a
prefixID
production, any
PNAME_NS
is substituted with the
namespace.
Note that the prefix may be an empty string, per the
PNAME_NS
production:
(PN_PREFIX)? ":"
Map[
string
->
blank
node
bnodeLabels
— A
mapping from string to blank node.
RDF_Term
curSubject
— The
curSubject
is bound to the
subject
production.
RDF_Term
curPredicate
— The
curPredicate
is bound to
the
verb
production. If token matched was "
",
curPredicate
is
bound to the IRI
RDF Term Constructors
This table maps productions and lexical tokens to
RDF terms
or components of
RDF terms
listed in
production
type
procedure
IRIREF
IRI
The characters between "<" and ">" are taken, with the
numeric escape sequences
unescaped, to form the unicode string of the IRI. Relative IRI resolution is performed per
Section 6.3
PNAME_NS
prefix
When used in a
prefixID
or
sparqlPrefix
production, the
prefix
is the potentially empty unicode string matching the first argument of the rule is a key into the
namespaces map
IRI
When used in a
PrefixedName
production, the
iri
is the value in the
namespaces map
corresponding to the first argument of the rule.
PNAME_LN
IRI
A potentially empty
prefix
is identified by the first sequence,
PNAME_NS
. The
namespaces map
MUST
have a corresponding
namespace
. The unicode string of the IRI is formed by unescaping the
reserved characters
in the second argument,
PN_LOCAL
, and concatenating this onto the
namespace
STRING_LITERAL_SINGLE_QUOTE
lexical form
The characters between the outermost "'"s are taken, with
numeric
and
string
escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_QUOTE
lexical form
The characters between the outermost '"'s are taken, with
numeric
and
string
escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG_SINGLE_QUOTE
lexical form
The characters between the outermost "'''"s are taken, with
numeric
and
string
escape sequences unescaped, to form the unicode string of a lexical form.
STRING_LITERAL_LONG_QUOTE
lexical form
The characters between the outermost '"""'s are taken, with
numeric
and
string
escape sequences unescaped, to form the unicode string of a lexical form.
LANGTAG
language tag
The characters following the
form the unicode string of the language tag.
RDFLiteral
literal
The literal has a lexical form of the first rule argument,
String
. If the
'^^' iri
rule matched, the datatype is
iri
and the literal has no language tag. If the
LANGTAG
rule matched, the datatype is
rdf:langString
and the language tag is
LANGTAG
. If neither matched, the datatype is
xsd:string
and the literal has no language tag.
INTEGER
literal
The literal has a lexical form of the input string, and a datatype of
xsd:integer
DECIMAL
literal
The literal has a lexical form of the input string, and a datatype of
xsd:decimal
DOUBLE
literal
The literal has a lexical form of the input string, and a datatype of
xsd:double
BooleanLiteral
literal
The literal has a lexical form of the
true
or
false
, depending on which matched the input, and a datatype of
xsd:boolean
BLANK_NODE_LABEL
blank node
The string matching the second argument,
PN_LOCAL
, is a key in
bnodeLabels
. If there is no corresponding blank node in the map, one is allocated.
ANON
blank node
A blank node is generated.
blankNodePropertyList
blank node
A blank node is generated. Note the rules for
blankNodePropertyList
in the next section.
collection
blank node
For non-empty lists, a blank node is generated. Note the rules for
collection
in the next section.
IRI
For empty lists, the resulting IRI is
rdf:nil
. Note the rules for
collection
in the next section.
RDF Triples Constructors
A Turtle document defines an
RDF graph
composed of set of
RDF triple
s.
The
subject
production sets the
curSubject
The
verb
production sets the
curPredicate
Each
object
in the document produces an RDF triple:
curSubject
curPredicate
Property Lists:
Beginning the
blankNodePropertyList
production records the
curSubject
and
curPredicate
, and sets
curSubject
to a novel
blank node
Finishing the
blankNodePropertyList
production restores
curSubject
and
curPredicate
The node produced by matching
blankNodePropertyList
is the blank node
Collections:
Beginning the
collection
production records the
curSubject
and
curPredicate
Each
object
in the
collection
production has a
curSubject
set to a novel
blank node
and a
curPredicate
set to
rdf:first
For each object
object
after the first produces a triple:
object
n-1
rdf:rest
object
Finishing the
collection
production creates an additional triple
curSubject rdf:rest rdf:nil
and restores
curSubject
and
curPredicate
The node produced by matching
collection
is the first blank node
for non-empty lists and
rdf:nil
for empty lists.
Parsing Example
The following informative example shows the semantic actions performed when parsing this Turtle document with an LALR(1) parser:
Map the prefix
ericFoaf
to the IRI
Map the empty prefix to the IRI
Assign
curSubject
the IRI
Assign
curPredicate
the IRI
Emit an RDF triple:
<...rdf#ericP>
<.../givenName>
"Eric"
Assign
curPredicate
the IRI
Emit an RDF triple:
<...rdf#ericP>
<.../knows>
<...who/dan-brickley>
Emit an RDF triple:
<...rdf#ericP>
<.../knows>
_:1
curSubject
and reassign to the blank node
_:1
curPredicate
Assign
curPredicate
the IRI
Emit an RDF triple:
_:1
<.../mbox>
Restore
curSubject
and
curPredicate
to their saved values (
<...rdf#ericP>
<.../knows>
).
Emit an RDF triple:
<...rdf#ericP>
<.../knows>
Embedding Turtle in HTML documents
HTML [[HTML5]]
script
tags
can be used to embed data blocks in documents. Turtle can be easily embedded in HTML this way.
Turtle content should be placed in a
script
tag with the
type
attribute set to
text/turtle
and
symbols
do not need to be escaped inside of script tags. The character encoding of the embedded Turtle
will match the HTML documents encoding.
XHTML
Like JavaScript, Turtle authored for HTML (
text/html
) can break when used in XHTML
application/xhtml+xml
). The solution is the same one used for JavaScript.
When embedded in XHTML Turtle data blocks must be enclosed in CDATA sections. Those CDATA markers must be in Turtle comments. If the character sequence "
]]>
" occurs in the document it must be escaped using strings escapes (
\u005d\u0054\u003e
). This will also make Turtle safe in polyglot documents served as both
text/html
and
application/xhtml+xml
. Failing to use CDATA sections or escape "
]]>
" may result in a non well-formed XML document.
Parsing Turtle in HTML
There are no syntactic or grammar differences between parsing Turtle that has been embedded
and normal Turtle documents. A Turtle document parsed from an HTML DOM will be a stream of character data rather than a stream of UTF-8 encoded bytes. No decoding is necessary if the HTML document has already been parsed into DOM. Each
script
data block is considered to be it's own Turtle document.
@prefix
and
@base
declarations in a Turtle data bloc are scoped to that data block and do not effect other data blocks.
The HTML
lang
attribute or XHTML
xml:lang
attribute have no effect on the parsing of the data blocks.
The base URI of the encapsulating HTML document provides a "Base URI Embedded in Content" per RFC3986 section 5.1.1.
Internet Media Type, File Extension and Macintosh File Type
Contact:
Eric Prud'hommeaux
See also:
How to Register a Media Type for a W3C Specification
Internet Media Type registration, consistency of use
TAG Finding 3 June 2002 (Revised 4 September 2002)
The Internet Media Type / MIME Type for Turtle is "text/turtle".
It is recommended that Turtle files have the extension ".ttl" (all lowercase) on all platforms.
It is recommended that Turtle files stored on Macintosh HFS file systems be given a file type of "TEXT".
This information that follows has been
submitted to the IESG
for review, approval, and registration with IANA.
Type name:
text
Subtype name:
turtle
Required parameters:
None
Optional parameters:
charset
— this parameter is required when transferring non-ASCII data. If present, the value of
charset
is always
UTF-8
Encoding considerations:
The syntax of Turtle is expressed over code points in Unicode [[!UNICODE]]. The encoding is always UTF-8 [[!UTF-8]].
Unicode code points may also be expressed using an \uXXXX (U+0000 to U+FFFF) or \UXXXXXXXX syntax (for U+10000 onwards) where X is a hexadecimal digit [0-9A-Fa-f]
Security considerations:
Turtle is a general-purpose assertion language; applications may evaluate given data to infer more assertions or to dereference IRIs, invoking the security considerations of the scheme for that IRI. Note in particular, the privacy issues in [[!RFC3023]] section 10 for HTTP IRIs. Data obtained from an inaccurate or malicious data source may lead to inaccurate or misleading conclusions, as well as the dereferencing of unintended IRIs. Care must be taken to align the trust in consulted resources with the sensitivity of the intended use of the data; inferences of potential medical treatments would likely require different trust than inferences for trip planning.
Turtle is used to express arbitrary application data; security considerations will vary by domain of use. Security tools and protocols applicable to text (e.g. PGP encryption, MD5 sum validation, password-protected compression) may also be used on Turtle documents. Security/privacy protocols must be imposed which reflect the sensitivity of the embedded information.
Turtle can express data which is presented to the user, for example, RDF Schema labels. Application rendering strings retrieved from untrusted Turtle documents must ensure that malignant strings may not be used to mislead the reader. The security considerations in the media type registration for XML ([[!RFC3023]] section 10) provide additional guidance around the expression of arbitrary data and markup.
Turtle uses IRIs as term identifiers. Applications interpreting data expressed in Turtle should address the security issues of
Internationalized Resource Identifiers (IRIs)
[[!RFC3987]] Section 8, as well as
Uniform Resource Identifier (URI): Generic Syntax
[[!RFC3986]] Section 7.
Multiple IRIs may have the same appearance. Characters in different scripts may
look similar (a Cyrillic "о" may appear similar to a Latin "o"). A character followed
by combining characters may have the same visual representation as another character
(LATIN SMALL LETTER E followed by COMBINING ACUTE ACCENT has the same visual representation
as LATIN SMALL LETTER E WITH ACUTE).
Any person or application that is writing or interpreting data in Turtle must take care to use the IRI that matches the intended semantics, and avoid IRIs that make look similar.
Further information about matching of similar characters can be found
in
Unicode Security
Considerations
[[UNICODE-SECURITY]] and
Internationalized Resource
Identifiers (IRIs)
[[RFC3987]] Section 8.
Interoperability considerations:
There are no known interoperability issues.
Published specification:
This specification.
Applications which use this media type:
No widely deployed applications are known to use this media type. It may be used by some web services and clients consuming their data.
Additional information:
Magic number(s):
Turtle documents may have the strings '@prefix' or '@base' (case sensitive) or the strings 'PREFIX' or 'BASE' (case insensitive) near the beginning of the document.
File extension(s):
".ttl"
Base URI:
The Turtle '@base
Macintosh file type code(s):
"TEXT"
Person & email address to contact for further information:
Eric Prud'hommeaux
Intended usage:
COMMON
Restrictions on usage:
None
Author/Change controller:
The Turtle specification is the product of the RDF WG. The W3C reserves change control over this specifications.
Acknowledgements
This work was described in the paper
New Syntaxes for RDF
which discusses other RDF syntaxes and the background
to the Turtle (Submitted to WWW2004, referred to as
N-Triples
Plus
there).
This work was started during the
Semantic Web Advanced Development Europe (SWAD-Europe)
project funded by the EU IST-7 programme IST-2001-34732 (2002-2004)
and further development supported by the
Institute for Learning and Research Technology
at the
University of Bristol
, UK (2002-Sep 2005).
Valuable contributions to this version were made by Gregg
Kellogg, Andy Seaborn, Sandro Hawke and the members of the RDF Working Group.
The document was improved through the review process by the wider community.
Change Log
Changes since
January
2014 Proposed Recommendation
Missing prefix added in example 11 in response to
comment
from Lars Svensson
Error
in grammar productions [21] and [23] fixed.
Error
in grammar productions [24] and [25] fixed.
Changes from
February
2013 Candidate Recommendation
to
January
2014 Proposed Recommendation
The addition of
sparqlPrefix
and
sparqlBase
which allow for using SPARQL style
BASE
and
PREFIX
directives in a Turtle document was marked "at risk" in the Candidate Recommendation publication. This feature is no longer at risk.
The title of this document was changed from
Turtle
" to "
RDF 1.1 Turtle
".
Removed the obsolete links to tests in
Sec. 7.1
Changes from
August 2011 First Public Working Draft
to
Candidate Recommendation
Renaming for STRING_* productions to STRING_LITERAL_QUOTE sytle names rather than numbers
Local part of prefix names can now include ":"
Turtle in HTML
Renaming of grammar tokens and rules around IRIs
Reserved character escape sequences
String escape sequences limited to strings
Numeric escape sequences limited to IRIs and Strings
Support top-level blank-predicate-object lists
Whitespace required between @prefix and prefix label
Changes from
January 2008 Team Submission
to
First Public Working Draft
Adopted three additional string syntaxes from SPARQL:
STRING_LITERAL2
STRING_LITERAL_LONG1
STRING_LITERAL_LONG2
Adopted SPARQL's syntax for prefixed names (see
editor's draft
):
'.'s in names in all positions of a local name apart from the first or last, e.g.
ex:first.name
digits in the first character of the
PN_LOCAL
lexical token, e.g.
ex:7tm
adopted SPARQL's IRI resolution and prefix substitution text.
explicitly allowed re-use of the same prefix.
Added
parsing rules
See also the
pre-W3C Submission changelog