Copyright ©2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
This is an editors' draft despite anything else said here; (if I overstep the mark it might be an editor's draft!).
This is a W3C RDF Core Working Group Last Call Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).
This document is in the Last Call review period, which ends on 31 January 2003. This document has been endorsed by the RDF Core Working Group.
This document is being released for review by W3C Members and other interested parties to encourage feedback and comments.
This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
In conformance with W3C policy requirements, known patent and IPR constraints associated with this Working Draft are detailed on the RDF Core Working Group Patent Disclosure page.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
The Resource Description Framework (RDF) is a framework for representing information in the Web.
This document defines an abstract syntax on which RDF is based, and which serves to link its concrete syntax to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
Normative documentation of the RDF core falls into the following areas:
Within this document normative sections are explicitly labelled as such.
The framework is designed so that vocabularies can be layered on top of this core. RDF vocabulary definition language (RDF schema) [RDF-VOCABULARY] is the first such vocabulary. Others (cf. OWL [OWL] and the applications in the primer [RDF-PRIMER]) are in development.
In section 2, some background to the design goals and rationale of RDF is presented. There is also some discussion of the intended implications of publishing an RDF document (section 2.4).
RDF's abstract syntax is a graph, which can be serialized using XML (but which is quite distinct from XML's tree-based infoset [XML-INFOSET]). The abstract syntax captures the fundamental structure of RDF, independently of any concrete syntax used for serialization. The formal semantics of RDF are defined in terms of the abstract syntax. XML content of literals is described in section 3, and the abstract syntax is defined in section 4 of this document.
Section 5 discusses fragment identifier use.
RDF has an abstract syntax that reflects a simple graph-based data model, and formal semantics with a rigorously defined notion of entailment providing a basis for well founded deductions in RDF data.
The development of RDF has been motivated by the following uses, among others:
RDF is designed to represent information in a minimally constraining, flexible way. It can be used in isolated applications, where individually designed formats may be more perspicuous, but RDF's generality offers greater value from sharing. The value of information thus increases as it becomes accessible to more applications across the entire Internet.
The design of RDF is intended to meet the following goals:
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
Note: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] for more information about "model theory" as used in the literature of mathematics and logic.
RDF has a formal semantics which provides a dependable basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URI references, or URIrefs). URI references are used for naming all kinds of things in RDF.
The other kind of value that appears in RDF data is a literal.
RDF has a recommended XML serialization form [RDF-SYNTAX], which can be used to encode the data model for exchange of information among applications.
RDF can use values represented according to XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make simple assertions about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making assertions that are nonsensical or inconsistent with the world as people see it, and applications that build upon RDF must find ways to deal with incomplete and conflicting sources of information. (This is where RDF departs from more prescriptive approaches to representing data in XML, which aim to present information that is well-formed and complete for an application's needs.)
RDF can represent arbitrary information that can be expressed as simple facts. (What constitutes a simple fact is discussed later, in section 2.3.5)
RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.
This goal is explored further in section 2.4 below.
RDF uses the following key concepts:
The underlying structure of any expression in RDF can be viewed as a directed labelled graph, which consists of nodes and labelled directed arcs that link pairs of nodes (these notions are defined more formally in section 4). The RDF graph is a set of triples:
Each property arc represents a statement of a relationship between the nodes that it links, having three parts:
The direction of the arc is significant: it always points toward the object of a statement.
The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.
Nodes in an RDF graph are URIs with optional fragment identifiers (URI references, or URIrefs), literals, or blank (having no separate form of identification). Arcs are labelled with URI references. (See [URI], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 4.1.)
The URI reference or literal on a node identifies what that node represents. The label on an arc identifies the relationship between the nodes connected by the arc. The arc label may also be a node in the graph.
A blank node is an RDF graph node that is not a URI reference or a literal. In the RDF abstract syntax, a blank node is just a unique node that can be used in one or more RDF statements, and has no globally distinguishing identity.
A convention used by some linear representations of an RDF graph to allow several statements to reference the same blank node is to use a blank node identifier, which is a local identifier that can be distinguished from all URIs and literals. When graphs are merged, their blank nodes must be kept distinct if meaning is to be preserved; this may call for re-allocation of blank node identifiers.
Note that blank node identifiers are not part of the RDF abstract syntax, and the representation of statements that use blank nodes is entirely dependent on the particular concrete syntax used.
Datatypes are used by RDF in the representation of values such as integers, floating point numbers and dates.
RDF uses the datatype abstraction defined by XML Schema Part 2: Datatypes [XML-SCHEMA2]. A datatype consists of a lexical space, a value space and a datatype mapping.
A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype:
With one exception, the datatypes used in RDF have a lexical space consisting of a set of strings. The exception is rdf:XMLLiteral, whose lexical space also includes pairs of strings and language identifiers. The value obtained through its datatype mapping may depend on the language identifier.
For example, the datatype mapping for the XML Schema datatype xsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space | {T, F} |
---|---|
Lexical Space | {"0", "1", "true", "false"} |
Datatype Mapping | {<"true", T>, <"1", T>, <"0", F>, <"false", F>} |
RDF predefines just one datatype rdf:XMLLiteral, used for embedding XML in RDF (see section 3).
There is no built-in concept of numbers or dates or other common values. Rather, RDF defers to datatypes that are defined separately, and identified with URI references.The predefined XML Schema datatypes [XML-SCHEMA2] are expected to be widely used for this purpose. The defining authority of a URI which identifies a datatype is responsible for specifying the datatype's lexical space, value space and datatype mapping.
RDF provides no mechanism for defining new datatypes. XML Schema Datatypes [XML-SCHEMA2] provides an extensibility framework suitable for defining new datatypes for use in RDF.
Literals are used to identify values such as numbers and dates by means of a lexical representation. Anything represented by a literal could also be represented by a URI, but it is often more convenient or intuitive to use literals.
A literal may be the object of an RDF statement, but not the subject or the arc.
Literals may be plain or typed :
Continuing the example from section 2.3.3, the typed literals which can be defined using the XML Schema datatype xsd:boolean are:
Typed Literal | Datatype Mapping | Value |
---|---|---|
<xsd:boolean, "true"> | <"true", T> | T |
<xsd:boolean, "1"> | <"1", T> | T |
<xsd:boolean, "false"> | <"false", F> | F |
<xsd:boolean, "0"> | <"0", F> | F |
Some simple facts indicate a relationship between two objects. Each such fact may be represented as an RDF triple in which the predicate names the relationship between the subject and the object. A familiar representation of such a fact might be as a row in a table in a relational database. The table has two columns, corresponding to the subject and the object of the RDF triple. The name of the table corresponds to the predicate of the RDF triple. A further familiar representation may be as a two place predicate in first order logic.
Relational databases allow for tables with an arbitray number of columns. First order logic allows predicates with an arbitrary number of places. A row in such a table, or such a predicate, has to be decomposed before being represented as RDF triples. Usually, such a decomposition introduces a new blank node, corresponding to the row. A new triple is introduced for each cell in the row. The predicate of the triple corresponds to the column name. The blank node is usually the subject of the triple. The value in the cell usually corresponds to the object of the triple. Sometimes a further triple, with predicate rdf:type is introduced. This has the new blank node as subject, and its object corresponds to the table name.
As an example, consider Figure 5 from the [RDF-PRIMER]:
This information might correspond to a row in a table "STAFFADDR"STAFFADDRESSES", with columns STAFFID, STREET, STATE, CITY and ZIP.
Conjunction (logical-AND) of statements can be used to express more complex facts. RDF does not provide means to express negation (NOT) or disjunction (OR). The expressive power of RDF corresponds to the existential-conjunctive (EC) subset of first order logic [Sowa].
Through its use of extensible URI-based vocabularies, RDF provides for expression of facts about arbitrary subjects; i.e. assertions of named properties about specific named things. A URI can be constructed for any thing that can be named, so RDF facts can be about any such things.
The ideas on meaning and inference in RDF are underpinned by the formal concept of entailment, as discussed in the RDF semantics document [RDF-SEMANTICS]. In brief, an RDF expression A is said to entail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if we presume or demonstrate the truth of A then we can also infer the truth of B.
There are two aspects to the meaning of an RDF graph. There is the formal meaning as determined by the RDF semantics [RDF-SEMANTICS]. This determines, with mathematical precision, the conclusions that can logically be drawn from an RDF graph. There is also the social meaning of the graph. It is the social meaning that affects what it means to people and how it interacts with human social institutions such as our systems of law.
RDF/XML expressions, i.e. encodings of RDF graphs, can be used to make claims or assertions about the 'real' world. Such expressions are said to be asserted.
Not every RDF/XML expression is asserted. Some may convey meaning that is partly determined by the circumstances in which they are used. For example, in English, a statement "I don't believe that George is a clown" contains the words "George is a clown", which, considered in isolation, has the form of an assertion that George exhibits certain comic qualities. However, considering the whole sentence, no such assertion is considered to be made.
When an RDF graph is asserted in the Web, its publisher is saying something about their view of the world. Such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denials or illustrations).
The technical machinery includes protocols for transferring information (e.g. HTTP, SMTP) and file formats for encapsulating and labelling information (e.g. MIME, XML). A media type, application/rdf+xml [RDF-MIME-TYPE] indicates the use of RDF/XML as distinct from some other XML that happens to look like RDF. Issuing an HTTP GET request and obtaining data with a "200 OK" response code is a technical indication that the received data was published at the request URI; but data received with a "404 Not found" response cannot be considered to be similarly published information.
The social machinery includes the form of publication: publishing some unqualified statements on one's World Wide Web home page would generally be taken as an assertion of those statements. But publishing the same statements with a qualification, such as "here are some common myths", or as part of a rebuttal, would likely not be construed as an assertion of the truth of those statements. Similar considerations apply to the publication of assertions expressed in RDF.
An RDF graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF information, or programmers writing software to perform specialized forms of deduction in the Semantic Web.
[[ This text is still drafty. ]]
Human publishers of RDF content commit themselves to the mechanically-inferred social obligations.
The meaning of an RDF document includes both the social meaning and the formal meaning. Moreover, both the social meaning of the formal entailments and the formal entailments of social meanings are part of the social meaning.
When a pair of RDF graphs G and G' both logically entail the other, then asserting G is equivalent to asserting G'. In such a case, the implicit assertion of G' should be interpreted using the same social conventions that are reasonably used to interpret the explicit assertion of G.
The logical entailment intended with content of media type application/rdf+xml [RDF-MIME-TYPE] is that defined in the RDF Semantics [RDF-SEMANTICS], as XSD entailment, i.e. respecting the RDF vocabulary, the RDFS vocabulary and the XML Schema datatypes.
Information within such content, or the use of a different media type, may indicate the use of a semantic extension to RDF. When such an extension is indicated, this usually indicates that the stronger formal entailment associated with that extension is intended as part of the meaning of the RDF.
The social conventions surrounding use of RDF include the idea that each URI 'belongs to' somebody who has authority and responsibility for defining its meaning. The social conventions are rooted in the URI specification [RFC2396] and registration procedures [RFC2717]. A URI scheme registration refers to a specification of the detailed syntax and interpretation for that scheme, from which the defining authority for a given URI may be deduced. In the case of http: URIs, the defining specification is the HTTP protocol specification [RFC2616], which obtains a resource representation from the host named in the URI; thus, the owner of the host's DNS domain controls (observable aspects of) the URI's meaning.
Publication of RDF, when considered as a social act, constitutes a publication of some content that is defined by whatever normal social conditions are used by the publishers of any terms in the RDF to define the meanings of those terms, even if those meanings and definitions are not accessible to the formal semantics of RDF; and, moreover, those meanings are preserved under any formally sanctioned inference processes.
[[ This text uses informal and/or rhetorical flourishes that should be edited out for maximum accessible to non-mother tongue readers. ]]
Imagine three websites each publishing some RDF:
(A) http://insult.example.com/lexicon# asserts the following, and this is all that one can find on the website about that term: |
||
A:Clown | rdf:type | rdfs:Class . |
A:Clown | rdfs:Comment | "A foolish person, whose pronouncements are probably ill-considered and not to be taken seriously" . |
(B) http://AngloSaxon.example.org/lexicon# asserts: |
||
B:Comic | rdf:subClassOf | <http://insult.example.com/lexicon#Clown> . |
|
||
C:JohnSmith | rdf:type | <http://AngloSaxon.example.org/lexicon#Comic> . |
Now, it follows by the formal RDF model theory that these three together entail:
C:JohnSmith | rdf:type | <http://insult.example.com/lexicon#Clown> . |
which the person identified as C:JohnSmith might reasonably consider an insult. Why? Not because of the RDF model theory, which merely says he is in some class about which nothing can be formally inferred. However, the rdfs:comment associated with that class name by the owner of that name provides the insulting content, in the social context of Web publication, even though it cannot be formally inferred via the RDF inference rules.
But who has insulted the identified person? A merely defined the term; B does not mention him in particular, so even A and B together do not constitute a personal insult. And C might argue that although he refers to the person, he only asserts that he is a comic, which is not in itself grounds for a libel suit. However, one could reasonably claim that C is to blame, since C uses not a generic term 'Comic', but a particular URI reference which is defined by its owner (B) in a way which is clearly insulting, since B in turn explicitly refers to, and uses, the term defined by A. Thus, C's use of a B-defined term suggests a clear intent by C to convey a meaning defined by B, by virtue of a definition by A, which is insulting.
By using the specific name http://AngloSaxon.example.org/lexicon#Comic instead of some term defined in, say, a glossary of job descriptions, B has explicitly removed his use of the term 'Clown' from any formal connection with people who are entertainers. In order to succeed in his probable intent of making a generic slander against these people, B should have used a term that was defined by someone else, such as:
<http://entertainers.example.com/glossary#Comic> rdfs:subClassOf <http://insult.example.com/lexicon#Clown> . |
and then if C had also used this first URI reference, then in spite of a similar formal inference chain generating the insulting conclusion about C:JohnSmith, there would be nobody to sue, since now C would indeed have simply made a harmless observation about his occupation, and B's assertion, while indeed arguably offensive, makes no reference to him in particular.
RDF assumes that for any URI some individual or organization has the authority to define the meaning of that URI. An RDF predicate is defined by the individual or organization with such authority with respect to the its URI.
RDF uses URIs to identify resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:
Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespaces [XML-NS] associated with the RDF core vocabulary terms.
Note: these namespace URIs are the same as those used in earlier RDF documents [RDF-MS] [RDF-SCHEMA].
[[[NOTE FOR REVIEWERS: Some terms in these namespaces have been deprecated, some have been added, and some RDF schema terms have had their meaning changed. We invite community feedback regarding the relative costs of adopting these changes under the old namespace URIs vs creating new URIs for this revision of RDF.]]]
Vocabulary terms in the rdf: namespace are listed in section 5.1 of the RDF syntax specification [RDF-SYNTAX].
Vocabulary terms defined in the rdfs: namespace are defined in the RDF schema vocabulary specification [RDF-VOCABULARY].
RDF provides for XML content as a possible literal value. This typically originates from the use of rdf:parseType="Literal" in the RDF/XML Syntax [RDF-SYNTAX].
Such content is indicated in an RDF graph using a typed literal whose datatype is a special built-in datatype, rdf:XMLLiteral.
As part of the definition of this datatype, we use an ancillary definition.
The XML document corresponding to a pair ( str, lang ) is formed as follows:
Concatenate the five strings:
Encode the resulting Unicode string in UTF-8 to form the corresponding XML document.
No escaping is applied. The choice of rdf-wrapper is fixed but arbitrary.
The XML document corresponding to a string str is formed as the XML document corresponding to the pair (str, "").
Using this, the datatype rdf:XMLLiteral is defined as follows.
Reminder: All other datatypes have a lexical space being a set of strings, and a mapping which maps strings to values.
Note: Not all values of this datatype are compliant with XML 1.1 [XML 1.1]. If compliance with XML 1.1 is desired, then only those values that are fully normalized according to XML 1.1 should be used.
This section defines the RDF abstract syntax. The RDF abstract syntax is a set of triples, called the RDF graph.
This section also defines equality between RDF graphs. A definition of equality is needed to support the RDF Test Cases [RDF-TESTS] specification.
Note: Syntactic equality, between RDF graphs, URI references and literals, is often inappropriate for applications. Semantic notions, defined in [RDF-SEMANTICS], such as entailment between graphs, and having the same denotation are usually preferable.
An RDF triple contains three components, called:
The subject may not be an RDF literal.
Note: subjects and objects are otherwise unrestricted, since anything that is neither an RDF literal nor an RDF URI reference. is treated as a blank node.
An RDF triple is conventionally written in the order subject, predicate, object.
The predicate is also known as the property of the triple.
An RDF graph is a set of RDF triples.
The nodes of an RDF graph is the set of subjects and objects of triples in the graph.
The blank nodes of an RDF graph are those nodes that are not RDF literals or RDF URI references.
Two RDF graphs G and G' are equal if there is a bijection I between the nodes of the two graphs, such that:
A URI reference within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:
The disallowed characters that must be %-escaped include all non-ASCII characters, the excluded characters listed in Section 2.4 of [URI], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC-2732].
Disallowed characters must be escaped as follows:
Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings.
Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and constrained to be in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).
Note: RDF URI references are compatible with International Resource Identifiers as defined by [XML Namespaces 1.1].
Note: The restriction to absolute URI references is found in this abstract syntax. When there is a well-defined base URI, concrete syntaxes, such as RDF/XML, may permit relative URIs as a shorthand for such absolute URI references,
A literal in an RDF graph contains three components called:
The lexical form is present in all RDF literals; the language identifier and the datatype URI may be absent from an RDF literal.
A plain literal is one in which the datatype URI is absent.
A typed literal is one in which the datatype URI is present.
Note: Literals in which the lexical form begins with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
Note: When using the language identifier, care must be taken not to confuse language with locale. The language identifier only relates to human language text. Presentational issues, how to best represent typed data to the end-user, should be addressed in end-user applications.
Two literals are equal if and only if all of the following hold:
Note: RDF Literals are distinct and distinguishable from RDF URI references; e.g. http://example.org as an RDF Literal (untyped, without a language identifier) is not equal to http://example.org as an RDF URI reference.
The datatype URI refers to a datatype. For XML Schema built-in datatypes, URIs such as http://www.w3.org/2001/XMLSchema#int are used. The URI of the datatype rdf:XMLLiteral may be used. There may be other, implementation dependent, mechanisms by which URIs refer to datatypes.
The value associated with a typed literal is found by applying the datatype mapping associated with the datatype URI to the lexical form. This mapping fails if the lexical form is not in the lexical space of the datatype associated with the datatype URI. Exceptionally, if the datatype is rdf:XMLLiteral and the literal has a language identifier, then the datatype mapping is applied to the pair form by the lexical form and the language identifier.
A typed literal for which the datatype does not map the lexical form to a value is not syntacticly ill-formed.
[[[Review interaction with model theory concerning typed values.]]]
RDF uses an RDF URI Reference, which may include a fragment identifier, as a context free identifier for a resource. RFC 2396 [URI] states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.
These apparently conflicting views are reconciled by considering that, in an RDF graph, any RDF URI reference consisting of an absolute URI and a fragment identifier identifies the same thing as the fragment identifier does in an application/rdf+xml [RDF-MIME-TYPE] representation of the resource identified by the absolute URI component. Thus:
This provides a handling of URI references and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior.
This document contains a significant contribution from Pat Hayes, Sergey Melnik and Patrick Stickler, under whose leadership was developed the framework described in the RDF family of specifications for representing datatyped values, such as integers and dates.
The editors acknowledge valuable contributions from the following:
Jeremy Carroll thanks Oreste Signore, his host at the W3C Office in Italy and Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", part of the Consiglio Nazionale delle Ricerche, where Jeremy is a visiting researcher.
This document is a product of extended deliberations by the RDFcore Working Group, whose members have included:
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working Group members who contributed to this earlier work are: