Nine by Nine: Using datatype-aware inferences with RDF

TOC

Nine by Nine	G. Klyne
	Nine by Nine
	November 5, 2003

Using datatype-aware inferences with RDF

Abstract

"Using datatype-aware inferences with RDF" explores some options for incorporating well-known datatype properties into applications that perform inference over RDF data. It is primarily concerned with how datatype properties are incorporated into application descriptions, rather than the underlying inference mechanisms.

A primary purpose of this note is to explore options for incorporating datatype inference mechanisms into Swish, a framework for Semantic Web inference using Haskell.

Copyright © 2003, G. Klyne, Nine by Nine

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is available at http://www.gnu.org/copyleft/fdl.html.

$Id: RDF-Datatype-inference.html,v 1.1 2003/12/12 20:35:50 graham Exp $

TOC

1. Introduction

1.1 Terminology and conventions

2. Some existing mechanisms

2.1 CWM rules

2.2 Euler

2.3 Web Ontology Language (OWL)

2.4 DAML rules language

2.5 OWL Datatype Groups

2.6 Jena inference engine

2.7 Sesame

2.8 Intellidimension RDF Gateway

2.9 RDF Inference Language (RIL)

2.10 Metalog

3. Inference mechanisms in Swish

4. Choices for defining inference in Swish

4.1 Haskell code for each new inference rule

4.2 Rules with variable binding modifers

4.3 Rules with special properties

4.4 Constraints using variable binding modifiers

4.5 Constraints using class restrictions

5. Implementing datatype inference in Swish

6. Test cases

6.1 Test 001: forward integer sum

6.2 Test 002: backward integer sum

6.3 Test 003: all consistent values supplied

6.4 Test 004: inconsistent values supplied

6.5 Test 005: consistent values with cardinality constraints

6.6 Test 006: inconsistent values with cardinality constraints

6.7 Test 007: incomplete values supplied

§ References

§ Author's Address

A. Revision history

B. Review notes and todo list

TOC

1. Introduction

This memo explores ways in which properties of well-known datatypes may be incorprated into applications that perform inference over RDF data. Consider the following simple motivating example.

Given:

  :vehicle :seatedCapacity "30"^^xsd:integer .
  :vehicle :standingCapacity "10"^^xsd:integer .

we may reasonable wish to deduce:

  :vehicle :totalCapacity "40"^^xsd:integer .

based on application domain knowledge that the total capacity is the arithmetic sum of the seated and standing capacities of a vehicle, where the arithmetic sum is presumed to be a known property of the xsd:integer datatype.

One approach, adopted by CWM [9], and many other systems, is to express this domain knowledge as an inference rule. But RDF as currently standardized is not capable of expressing inference rules, so this immediately forces us outside what can be expressed using RDF.

This note is intended to explore the idea that required application inference patterns can be expressed using a combination of standard RDF with possible semantic extensions (for expressing the application domain knowledge) and separate "built in" knowledge of datatypes used.

1.1 Terminology and conventions

Examples of RDF data are presented using the Notation 3 format [8].

[[[Comments enclosed in triple-brackets, like this, contain editorial comments and notes]]]

TOC

2. Some existing mechanisms

This section briefly surveys some existing mechanisms for incorporating application domain and datatype knowledge into RDF inferences.

2.1 CWM rules

A widely used tool for creating RDF applications that perform inference is CWM [9].

CWM is a forward-chaining reasoner, which uses an extended RDF syntax that is capable of describing simple rules used in forward chaining.

A rule for the previous example could be expressed in CWM thus:

    { ?v :seatedCapacity ?c1 .
      ?v :standingCapacity ?c2 .
      (?c1 ?c2) math:sum ?c3 . }
  => 
    { ?v :totalCapacity ?c3 . }

Points to note are:

The term ?name indicates a name that is universally quantified within the scope of the rule. Standard RDF does not provide any way to express universal quantification. It is in the nature of a rule that it generalizes some assertion over arbitrarily many things, rather than asserting information about a single thing.
The property math:sum is a built-in property with special semantics known to CWM. It embodies knowledge of the relationship between numbers and their arithmetic sum. CWM defines a number of such built-in properties [10].
The CWM built-in properties are not specifically linked to RDF datatypes, though many reasonably could be. In many cases, the CWM built-in's refer to RDF plain literal values.
CWM requires that the built-in properties be used in the antecedent of a rule.

2.2 Euler

Euler [20] is one of the very early inference tools for RDF, and has been regularly updated to test proposed semantics from the RDFcore working group [23].

Euler is essentially a backward-chaining reasoner, using rules similar to those defined for CWM. Euler also supports a range of CWM's built-in properties.

Euler treats built-in properties in a rule antecedent as terms to be unified, just like any others, except that the unification is handled by special code rather than by reference to the knowledge base. The terms in the antecedent of a rule are taken in the order they are given, and unified one at a time, adding new variable bindings as they arise. Ordering of terms in the antecedent is important because unification of terms that appear later in the antecedent may depend on bindings created by unification of preceding terms.

Individual terms (statements) in a rule antecedent are generally required to have at least their objects bound to specific values, and new bindings may be created as required for variables used in the subject position.

(See also: http://www.agfa.com/w3c/euler/easterP.n3 for an example of Euler using CWM builtins to calculate a date for Easter.)

2.3 Web Ontology Language (OWL)

OWL [7] is a semantic extension of RDF that can be used to describe classes of objects, and thereby provides a basis for certain kinds of reasoning about values declared to be members of OWL classes.

As far as I can tell, OWL does not provide the kind of universally quantified value that is implicit in a CWM-style inference rule. Universal quantification in OWL is limited to expressions of the form:

FORALL x, x in class C => (x satisfies some conditions). (For example, the rdfs:subClassOf construct.)
FORALL x, (x satisfies some conditions) <=> x in some un-named class. (This is the owl:Restriction form. Being an unnamed class in OWL-DL, the syntax does not provide a way for it to be used as an antecedent to an implication of the first form above. This naming restriction is relaxed in OWL-full.)

As OWL-Full allows an owl:Restriction to be named and used wherever a class name can appear, it appears that it provides a basis for defining rules based on whatever constraint conditions can be used within a restriction.

2.4 DAML rules language

DAML-rules [11] is a proposal of the DAML joint committee for a rules language based on OWL semantics. It extends the existing forms of OWL ontology axioms and facts with a form of "rule axiom".

The basic rule format is a simple antecedents imply consequent, and appears to be suitable for use with forward- or backward- chaining. The essential atomic truth values in both antecedent and consequent parts are:

Object is an instance of a described OWL class.
A pair of objects are related by a specifed OWL property, where the object may be an OWL individual or data value.

Variables may be used for individuals and data values within a rule, in which case thay are treated as universally quantified within the scope of the rule.

It is not clear that there is any specific mechanism here for dealing with relations on datatyped values, other than what is already provided by OWL (see Web Ontology Language (OWL) above).

2.5 OWL Datatype Groups

In their paper Web Ontology Reasoning with Datatype Groups[12], Pan and Horrocks discuss datatype inferencing with OWL in partular, and Description Logics in general. They suggest that datatype relations can be expressed by extending OWL's notion of a class restriction. (This paper, and the notion of Datatype Groups, is largely concerned with inference processesthat are guaranteed to be tractable; this note is not concerned with such issues.)

Using a class restriction in this way replaces the universally quantified variables in a rule with references to properties of some resource that may be any member of a class. The given conditions are true of all members of the class thus described, so the thing described is not mentioned directly, and there is no explicit term to be quantified.

2.6 Jena inference engine

Jena [13] is a general purpose RDF toolkit that includes an inference engine.

The Jena inference engine incorporates a hybrid forward and backward chaining reasoner, with provision for extension through the addition of additional inference code written in Java. I anticipate that relations in datatyped values (like the arithemtic example) would be catered for by added Java code.

Jena also incorporates a general purpose rule engine, that can be with simple rules, performing a combination of both forward and backward chaining [14].

2.7 Sesame

Sesame [15] is another RDF toolit written in Java, with provision for inferencing. As far as I can tell, inferencing capability is provided through Java code add-ins implementing a defined inference interface.

2.8 Intellidimension RDF Gateway

Intellidimension's RDF gateway [16] is a platform for building RDF applications, which includes an inference rule processor.

The general style of inference supported is forward chaining, similar to CWM, though the details are different. The package also supports "function rules" that can be used to create additional variable bindings for use in new statements deduced using a rule. These function rules serve a purpose comparable with CWM's built-in properties.

2.9 RDF Inference Language (RIL)

The RDF Inference Language (RIL) [17] is a language for defining inference rules. It appears to be implemented in the 4suite platform [18].

RIL describes a simple forward-chaining inference process similar to CWM and Intellidimension's RDF gateway described above. (Unlike those, it allows a rule antecedent to contain disjunctions and negations.)

2.10 Metalog

Metalog [19] is a SW3C project. [[[Details TBD. I couldn't find sufficiently detailed online documentation]]]

It appears that Metalog is an attempt to provide a pseudo-natural-language expression to a Prolog-like rule language over RDF. This suggests a backward chaining reasoner over Horn-clauses. I couldn't see any mechanism for dealing with datatype values.

TOC

3. Inference mechanisms in Swish

The preceding survey is intended to provide some insights into ways to present the mechanisms implemented in Swish [22], my Semantic Web inference toolkit in Haskell.

NOTE: The inference components of this software have not yet been published (as of 31-Oct-2003).

Swish defines a framework for inference to which new inference rules can be added by writing Haskell code that defines new instances of a specified inference datatype (cf. Sesame and Jena, above).

Any rule can be invoked in forward chaining, backward chaining or proof-checking modes:

Forward chaining: given some set of statements, uses the rule to deduce new statements. In principle, repeated application of forward chaining will find all facts that can be deduced by the inference rule from some initial knowledge base.
Backward chaining: given some expression, determine all of the antecedents that must be satisfied in order for the given consequent expression to be true.
Inference checking: given some antecedent expressions and a consequent expression, determine whether or not the inference rule can deduce the consequent from the antecedents.

Not all rules are required to fully support forward and backward chaining, but they are all expected to support the inference-checking mode.

Here is the Haskell [21] data type for an inference rule:

-- |Rule is a data type for inference rules that can be used
--  to construct a step in a proof.
data Rule ex = Rule
    -- |Name of rule, for use when displaying a proof
    { ruleName :: ScopedName
    -- |Forward application of a rule, takes a list of
    --  expressions and returns a list (possibly empty)
    --  of forward applications of the rule to combinations
    --  of the antecedent expressions.
    , fwdApply :: [ex] -> [ex]
    -- |Backward application of a rule, takes an expression
    --  and returns a list of alternative antecedents, each of
    --  which is a list of expressions that jointly yield the
    --  given consequence through application of the inference
    --  rule.  An empty list is returned if no antecedents
    --  will allow the consequence to be inferred.
    , bwdApply :: ex -> [[ex]]
    -- |Inference check.  Takes a list of antecedent expressions
    --  and a consequent expression, returning True if the
    --  consequence can be obtained from the antecedents by
    --  application of the rule.  When the antecedents and
    --  consequent are both given, this is generally more efficient
    --  that using either forward or backward chaining.
    --  Also, a particular rule may not fully support either
    --  forward or backward chaining, but all rules are required
    --  to fully support this function.
    , checkInference :: [ex] -> ex -> Bool
    }

The Rule datatype is polymorphic in ex, the type of expression over which proofs may be constructed. For use with RDF, it is instantiated with a data type that represents an RDF graph.

One of the inference modules implemented by Swish is a "graph closure" rule generator, which is simple rule processor that operates along fairly conventional lines. Some of the RDF core inference rules [3] are defined in this way. A rule has an antecedent expression, which is a conjunction of statements (possibly using named variables), and a consequent expression which is also a conjunction of statements (possibly using variables). This module performs both backward and forward chaining of the rule by the following steps:

Query the graph using the antecedent and/or elements of the consequent expression. The result of such a query is a set of variable bindings.
Apply a variable binding modifer to the variable bindings. This can be a filter that allows only those bindings that satisfy a certain condition to be used (e.g. some of the rules in the RDF core semantics [3] are applicable only to certain kinds of graph node), or it may create some additional variable bindings based on those obtained from the graph query (e.g. the RDF semantics in some cases requires a new bnode to be "allocated to" for an existing node. (These variable binding modifiers serve a similar purpose to the "Function Rules" in Intellidimension's RDF Gateway [16].)
Use the variable bindings to perform substitions into the consequent and/or antecedent expression, yielding the desired result.

Although both forward and backward chaining use a similar pattern of query-modify-subsitute, the details of how the query and substitutions are used vary between forward and backward chaining, but the same basic query and substitution mechanisms are used.

Here is the Haskell [21] data type from which a graph closure rule is constructed:

-- |Datatype for constructing a graph closure rule
data GraphClosure lb = GraphClosure
    { nameGraphRule :: ScopedName   -- ^ Name of rule for proof display
    , ruleAnt       :: [Arc lb]     -- ^ Antecedent triples pattern
                                    --   (may include variable nodes)
    , ruleCon       :: [Arc lb]     -- ^ Consequent triples pattern
                                    --   (may include variable nodes)
    , ruleModify    :: VarBindingModify lb lb
                                    -- ^ STructure that defines additional
                                    --   constraints and/or variable
                                    --   bindings based on other matched
                                    --   query variables.  Matching the
                                    --   antecedents.  Use 'varBindingId' if
                                    --   no additional variable constraints
                                    --   or bindings are added beyond those
                                    --   arising from graph queries.
    }

The GraphClosure datatype is polymorphic in lb, the type of node that may appear in a graph. For use with RDF, it is instantiated with a data type that represents an RDF graph label (i.e. URI, bnode or literal), or a named variable.

There are some similarities (and many differences) between the Swish graph closure rule module and the general purpose rule engine incorporated into Jena [13][14].

TOC

4. Choices for defining inference in Swish

This section considers some choices for defining inferences for use with Swish. These options are not mutually exclusive, and, in principle, any combination of them can be supported by the core inference framework, depending on how the application's user interface is implemented.

Swish does not currently have any mechanism for automated proof discovery (i.e. automated selection of which rules to apply), either when forward chaining or backward chaining, so the addition of multiple inference modules may create additional choices to be addressed later.

The following ideas for the presentation of datatyped inference definitions are suggested by the survey above:

Haskell code for each new inference rule.
Rules with variable binding modifers.
Rules with special properties.
Constraints using variable modifiers.
Constraints using class restrictions.

4.1 Haskell code for each new inference rule

Each new inference rule may be defined using ad-hoc Haskell code to create a suitable value of type Rule RDFGraph.

For example, Haskell code could be written to recognize any two of the properties :seatedCapacity, :standingCapacity and :totalCapacity applied to thje same subject resource, and define the third based on their known arithmetic relationship.

This approach is clearly flexible, but is not very friendly to non-Haskell-programmers, and may be somewhat error prone. (It also disregards existing code that could perform most if not all of what is required to implement the inference.)

I believe there will always be some cases where ad-hoc code is justified (RDF simple entailment is a case in point), but some easier form for non-programmers to express application domain knowledge is much desired. Ideally, it will be possible to enable some new inferences based on just RDF input, without having to use a different language to define the application domain knowledge on which the inference is based.

4.2 Rules with variable binding modifers

A graph closure closure rule generator is one of the built-in features of Swish. In its simplest form, it represents an antecedent and consequence which can be used in forward or backward chaining. To incorporate datatype reasoning, a number of special variable binding modifiers can be defined, corresponding to different relationships between values of common datatypes.

A rule definition would then consist of three parts:

a representation of an antecedent graph (or grahs), with variables for some of the nodes (or properties),
a similar representation of a consequent graph, and
references to one or more variable binding modifiers. The variable binding modifiers would be implemented as Haskell code, each corresponding to some common relationship between datatype values (e.g. math:sum representing the relationship between a, b and c in a+b=c).

Thus, we might have:

  RULE
    { ?v :seatedCapacity ?c1 .
      ?v :standingCapacity ?c2 . }
  => 
    { ?v :totalCapacity ?c3 . }
  WHERE
    xsd_integer:sum ?c1 ?c2 ?c3

This structure can be mapped directly onto Swish's built-in framework for dealing with graph rules, where xsd_integer:sum is the name given to a Haskell description of the + relation between integer values.

A drawback of this is that new syntactic structure (i.e. beyond that supported by RDF) must be introduce to accommodate rule definitions.

New rules of this form can be defined directly in Haskell, based on a value of type GraphClosure (see Inference mechanisms in Swish above); many of the built-in RDF(S) inference rules are defined in this way.

4.3 Rules with special properties

A variation of simple rules enhanced with datatype inference is that used by CWM, in which special properties in the rule antecedent are given special treatment. For example, all properties of the form (a b) math:plus c, where a+b=c, are assumed to be in the knowledge base, and the knowledge base query function behaviour is modified accordingly This can be used for both forward chaining and backward chaining .

Using the CWM [9] notation, also used by Euler [20], our motivating example looks like this:

    { ?v :seatedCapacity ?c1 .
      ?v :standingCapacity ?c2 .
      (?c1 ?c2) math:sum ?c3 . }
  => 
    { ?v :totalCapacity ?c3 . }

The graph closure rule module in Swish keeps the graph querying separate from other ways to create variable bindings, so this approach would not be supported directly by existing code. But I believe it would be relatively easy to write a new inference module that modifies the graph query behaviour when processing the antecedent of a rule.

Adopting this approach has the advantage of commonality with some existing tools, especially since CWM appears to be widely used. Set against this is that CWM rules are expressed using an extension to the syntactic structure of RDF, viz Formulae and Contexts.

4.4 Constraints using variable binding modifiers

A drawback of the antecedent antecedent-and-consequent approach to defining rules is that different rules are required to reflect differing patterns of available information. In the case of the vehicle capacity example, if we know the seating and standing capacities we can deduce the total capacity. Or if we know the total capacity and one of the seating or standing capacities, we can deduce the other. This is a property of the arithmetic relationship between them: knowing any two of them uniquely defines the third [[[does this property have a proper mathematical name?]]]. The common rule-based approach to inference does not reflect this, and a separate rule is needed for each possible pattern of available input.

Where values are related by a property like arithmetic sum, it seems desirable that a single rule can capture all of the inferences that are given by that relationship. Swish variable binding modifers can capture these differing inference patterns.

For the motivating example, one might contemplate something like:

  { ?v :seatedCapacity ?c1 .
    ?v :standingCapacity ?c2 .
    ?v :totalCapacity ?c3 . }
  WHERE
    xsd_integer:sum ?c1 ?c2 ?c3

and expect to be able to generate the third statement from any two of the three.

This seems reasonable in the case of the specific example, but it is not obvious how it might be generalized. An attempt to express this example in logical terms:

  forall ?v .
    ?v in domain(:seatedCapacity) &&
    ?v in domain(:standingCapacity) &&
    ?v in domain(:totalCapacity)
    exists ?c1 ?c2 ?c3 .
      { ?v :seatedCapacity ?c1 .
        ?v :standingCapacity ?c2 .
        ?v :totalCapacity ?c3 . } &&
      xsd_integer:sum ?c1 ?c2 ?c3

introduces a mixture of universal and existential quantifications. This is not an exact logical statement of the original notion, which was triggered by some combination of actual properties.

[[[I could pursue this, but it's not looking like a helpful approach. The ideas I'm entertaining involve a two-part rule-like form, containing an antecedent that must be matched, and a constraint part that may be partially matched, and variable constraints used to fill in the constraint elements that are not matched.]]]

4.5 Constraints using class restrictions

A possible approach allowing inference from multiple patterns of available information is suggested by OWL-style class-based reasoning [7], combined with an idea for extended OWL-like class restrictions noted by Pan and Horrocks [12].

It was noted above in OWL Datatype Groups that a class restriction is applicable over the class of things defined, without introducing name for such things that must be universally quantified. Thus, class constraints provide an alternative to universality quantified names for values that are typical of many of the rule systems examined.

Returning to the original example, we might have:

  :PassengerVehicle a swish:GeneralRestriction, owl:Class ;
    swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
    swish:constraint xsd_integer:sum .

This generalizes the notions of owl:Restriction and owl:BinaryRestriction (cf. [7][12]) to a restriction on several properties that are presented as a list.

The swish:constraint property names a relation that constrains the properties in such a way that if some are known, others may be inferred. The patterns of possible inferences depend on the relation used: in the case of xsd_integer:sum, any two values may used to deduce the third related value. The ordering of the swish:onProperties list of properties is significant, as it matches the order of values in the swish:constraint relation.

An advantage of this approach is that it is easily described within the existing RDF syntactic framework. It doesn't depend on contexts for defining a rule.

The effect of a simple inference rule of the form provided by CWM is obtained if the relation used defines only one usage pattern. For example, if xsd_integer:sum in the example above were implemented to deduce c from a and b in a+b=c, then this would enable just the same inferences as this CWM rule:

    { ?v :seatedCapacity ?c1 .
      ?v :standingCapacity ?c2 .
      (?c1 ?c2) math:sum ?c3 . }
  => 
    { ?v :totalCapacity ?c3 . }

As presented, this form of inference only provides for generalization over a single variable, a member of some class. This may well be sufficient for many practical applications requiring datatype inference, but if generalization over multiple simultaneous values is needed I think this could be achieved by generalizing over a composite value (e.g. resources denoting pairs of values).

TOC

5. Implementing datatype inference in Swish

Based on the foregoing, I find that the generalization of class restrictions appears to provide a convenient way of incorporating datatype inferences into Swish.

The datatype relations can be implemented as Haskell functions, and given URIs to serve as resource names.
Inference patterns can be defined using an RDF-equivalent syntax, in the form of class restructions.

Forward-chaining inference consists of matching as many as possible of the properties named in a class restriction, then apply the relation to deduce values to complete the remaining statements, which can then be accepted as newly deduced statements.

Backward chaining inference is essentially the same processes, except that the result is new statements that remain to be proved.

[[[Details TBD. Roughly: map class restriction into a series of queries with auto-generated variable names; use existing Swish logic for dealing with variable binding constraints; and back-subsitute new variable bindings to obtain new statements. Note that there may be zero, one or more final results for a given set of query matches; e.g. x=(+/-)y, if y is +ve then x=+y or x=-y.]]]

These are some of the structures that Swish uses to define a datatype:

-- |Datatype is a structure that defines a number of functions
--  and values that characterize the behaviour of a datatype.
--
--  A datatype is specified with respect to (polymophic in) a given
--  type of (syntactic) expression with which it may be used, and
--  a value type (whos existence is hidden as an existential type
--  within DatatypeMap
--
--  (I tried hiding the value type with an internal existential
--  declaration, but that wouldn't wash.  Hence this two-part
--  structure with Datatype (above) in which the internal detail
--  of the value type is hidden from users of the Datatype class.)
--
--  The datatype characteristic functions have two goals:
--  - to support the general datatype entailment rules defined by
--    the RDF semantics specification, and
--  - to define additional datatype-specific inference patterns by
--    means of which provide additional base functionality to
--    applications based on RDF inference.  The model for datatype
--    value calculations is inspired by that introduced by CWM for
--    arithmetic operations, e.g.
--       (1 2 3) math:sum ?x => ?x rdf:value 6
--    (where the bare integer n here is shorthand for "n"^^xsd:integer)
--
--  Datatype-specific inference patterns are provided in two ways:
--  (a) by variable binding modifiers that can be combined with the
--      query results during forward- for backward-chaining of
--      inference rules, and
--  (b) by the definition of inference rulesets that involve
--      datatype values.
--  I believe the first method to be more flexible than the second,
--  in that it more readily supports forward and backward chaining,
--  but can be used only through the definition of new rules.
--
--  Note that rules and variable binding modifiers that deal with
--  combined values of more than one datatype may be defined
--  separately.  Definitions in this module are generally applicable
--  only when using a single datatype.
--
--  ex      is the type of expression with which the datatype may be used.
--  lb      is the type of the variable labels used.
--  vn      is the type of value node used to contain a datatyped value
--  vt      is the internal value type with which the labels are associated.
--
data DatatypeVal ex lb vn vt = DatatypeVal
    { tvalName  :: ScopedName   -- ^Identifies the datatype, and also
                                --  its value space class.
    , tvalRules :: Ruleset ex   -- ^A set of named expressions and rules
                                --  that are valid in in any theory that
                                --  recognizes the current datatype.
    , tvalMap   :: DatatypeMap vt
                                -- ^Lexical to value mapping, where 'vt' is
                                --  a datatype used within a Haskell program
                                --  to represent and manipulate values in
                                --  the datatype's value space
    , tvalVmods :: [DatatypeVmod lb vn]
                                -- ^A set of named variable binding
                                --  modifier functions that may be
                                --  referenced by rule definitions.
    }

-- |DatatypeMap consists of methods that perform lexical-to-value
--  and value-to-canonical-lexical mappings for a datatype.
--
--  The datatype mappings apply to string lexical forms.
--
data DatatypeMap vt = DatatypeMap
    { mapL2V  :: String -> Maybe vt
                            -- ^ Function to map lexical string to
                            --   datatype value.  This effectively
                            --   defines the lexical space of the
                            --   datatype to be all strings for which
                            --   yield a value other than Nothing.
    , mapV2L  :: vt -> Maybe String
                            -- ^ Function to map a value to its canonical
                            --   lexical form, if it has such.
    }

-- |Type for variable binding modifier that has yet to be instantiated
--  with respect to the variables that it operates upon.
type OpenVarBindingModify lb vn = [lb] -> VarBindingModify lb vn

-- |Named variable binding modifier.
--
--  lb  is the type of the variable labels used
--  vn  is the type of value node used to contain a datatyped value
--
data DatatypeVmod lb vn = DatatypeVmod
    { dvModName :: ScopedName   -- ^Name of associated variable binding
                                --  modifier function.
    , dvModify  :: OpenVarBindingModify lb vn
                                -- ^Returns a variable binding modifer
                                --  function that operates on the variables
                                --  supplied.
    }

A key capability in Swish used by the datatype framework is this variable binding modifer:

-- |Define the type of a structure used to modify variable bindings
--  in forward chaining based on rule antecedent matches.  This
--  function is used to implement the "allocated to" logic described
--  in Appendix B of the RDF semantics document, in which a specific
--  blank node is associated with all matches of some specific value
--  by applications of the rule on a given graph.
--  Use 'id' if no modification of the variable bindings is required.
--
--  This datatype consists of the modifier function itself, which
--  operates on a list of variable bindings rather than a single
--  variable binding (because some modifications share context across
--  a set of bindings), and some additional descriptive information
--  that allows possible usage patterns to be analyzed.
--
--  Some usage patterns (see vbmUsage):
--  (a) filter:  all variables are input variables, and the effect
--      of the modifier function is to drop variable bindings that
--      don't satisfy some criterion.
--      Identifiable by an empty element in vbmUsage.
--  (b) source:  all variables are output variables:  a raw query
--      could be viewed as a source of variable bindings.
--      Identifiable by an element of vbmUsage equal to vbmVocab.
--  (c) modifier:  for each supplied variable binding, one or more
--      new variable bindings may be created that contain the
--      input variables bound as supplied plus some additional variables.
--      Identifiable by an element of vbmUsage that is some subset 
--      of vbmVocab.
--
--  A variety of variable usage patterns may be supported by a given
--  modifier:  a modifier may be used to define new variable bindings
--  from existing bindings in a number of ways, or simply to check that
--  some required relationship between bindings is satisfied.
--  (Example, for a + b = c, any one variable can be deduced from the
--  other two, or all three may be supplied to check that the relationship
--  does indeed hold.)
--
data VarBindingModify a b = VarBindingModify
    { vbmApply :: [VarBinding a b] -> [VarBinding a b]
                            -- ^Apply variable binding modifier to a
                            --  list of variable bindings, returning a
                            --  new list.  The result list is not
                            --  necessarily the same length as the
                            --  supplied list.
    , vbmVocab :: [a]       -- ^List of variables used by this modifier.
                            --  All results of applying this modifier contain
                            --  bindings for these variables.
    , vbmUsage :: [[a]]     -- ^List of binding modifier usage patterns
                            --  supported.  Each pattern is characterized as
                            --  a list of variables for which new bindings
                            --  may be created by some application of this
                            --  modifier, assuming that bindings for all other
                            --  variables in vbmVocab are supplied.
    }

This framework was defined for supporting some of the graph closure rules defined in the RDF formal semantics [3]. It also appears to provide the functionality needed to support class constraint reasoning for datatype values.

TOC

6. Test cases

This section defines some test cases that illustrate how an implementation of datatype inferencing is intended to behave.

All test cases assume the following namespace prefix definitions:

  [[[TBD]]]

6.1 Test 001: forward integer sum

This is the original motivating example.

From:

  :PassengerVehicle a swish:GeneralRestriction, owl:Class ;
    swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
    swish:constraint xsd_integer:sum .

  _:a a :PassengerVehicle ;
    :seatedCapacity "30"^^xsd:integer ;
    :standingCapacity "20"^^xsd:integer .

Deduce:

  _:a :totalCapacity "50"^^xsd:integer .

6.2 Test 002: backward integer sum

Alternative ussage patterns of the integer sum.

From:

  :PassengerVehicle a swish:GeneralRestriction, owl:Class ;
    swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
    swish:constraint xsd_integer:sum .

  _:a a :PassengerVehicle ;
    :seatedCapacity "30"^^xsd:integer ;
    :totalCapacity "51"^^xsd:integer .

  _:b a :PassengerVehicle ;
    :standingCapacity "20"^^xsd:integer ;
    :totalCapacity "52"^^xsd:integer .

Deduce:

  _:a :standingCapacity "21"^^xsd:integer .

  _:b :seatedCapacity "32"^^xsd:integer .

6.3 Test 003: all consistent values supplied

Integer sum rule with consistent values provided for all properties.

From:

  :PassengerVehicle a swish:GeneralRestriction, owl:Class ;
    swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
    swish:constraint xsd_integer:sum .

  _:a a :PassengerVehicle ;
    :seatedCapacity "30"^^xsd:integer ;
    :standingCapacity "23"^^xsd:integer ;
    :totalCapacity "53"^^xsd:integer .

Deduce no new information.

6.4 Test 004: inconsistent values supplied

Integer sum rule with inconsistent values provided for all properties.

From:

  :PassengerVehicle a swish:GeneralRestriction, owl:Class ;
    swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
    swish:constraint xsd_integer:sum .

  _:a a :PassengerVehicle ;
    :seatedCapacity "30"^^xsd:integer ;
    :standingCapacity "20"^^xsd:integer ;
    :totalCapacity "54"^^xsd:integer .

Deduce:

  _:a :standingCapacity "24"^^xsd:integer .

  _:a :seatedCapacity "34"^^xsd:integer .

  _:a :totalCapacity "54"^^xsd:integer .

[[[This behaviour is not obviously the "correct" specification. Another approach, easier to implement, would be for a GeneralRestriction to implicitly assume a cardinilaty of 1 for the constrained properties (see Test 006: inconsistent values with cardinality constraints).]]]

6.5 Test 005: consistent values with cardinality constraints

Integer sum rule with all property values provided. Repeats test 003 (see Test 003: all consistent values supplied), except that the properties have cardinality constraints.

From:

  :PassengerVehicle a owl:Class ;
    rdfs:subClassOf
    [ a swish:GeneralRestriction ;
      swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
      swish:constraint xsd_integer:sum . ] ;
    rdfs:subClassOf
    [ a owl:Restriction ;
      owl:onProperty :seatedCapacity ;
      owl:cardinality "1"^^xsd:nonNegativeInteger . ]
    rdfs:subClassOf
    [ a owl:Restriction ;
      owl:onProperty :standingCapacity ;
      owl:cardinality "1"^^xsd:nonNegativeInteger . ]
    rdfs:subClassOf
    [ a owl:Restriction ;
      owl:onProperty :totalCapacity ;
      owl:cardinality "1"^^xsd:nonNegativeInteger . ]

  _:a a :PassengerVehicle ;
    :seatedCapacity "30"^^xsd:integer ;
    :standingCapacity "25"^^xsd:integer ;
    :totalCapacity "55"^^xsd:integer .

Deduce no new information.

6.6 Test 006: inconsistent values with cardinality constraints

Integer sum rule with inconsistent property values provided. Repeats test 004 (see Test 004: inconsistent values supplied), except that the properties have cardinality constraints.

From:

  :PassengerVehicle a owl:Class ;
    rdfs:subClassOf
    [ a swish:GeneralRestriction ;
      swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
      swish:constraint xsd_integer:sum . ] ;
    rdfs:subClassOf
    [ a owl:Restriction ;
      owl:onProperty :seatedCapacity ;
      owl:cardinality "1"^^xsd:nonNegativeInteger . ]
    rdfs:subClassOf
    [ a owl:Restriction ;
      owl:onProperty :standingCapacity ;
      owl:cardinality "1"^^xsd:nonNegativeInteger . ]
    rdfs:subClassOf
    [ a owl:Restriction ;
      owl:onProperty :totalCapacity ;
      owl:cardinality "1"^^xsd:nonNegativeInteger . ]
  _:a a :PassengerVehicle ;
    :seatedCapacity "30"^^xsd:integer ;
    :standingCapacity "20"^^xsd:integer ;
    :totalCapacity "56"^^xsd:integer .

Deduce that this is unsatisfiable (e.g. because it would require "50"^^xsd:integer = "56"^^xsd:integer).

6.7 Test 007: incomplete values supplied

Integer sum rule with insufficient values provided to deduce the remaining values.

From:

  :PassengerVehicle a swish:GeneralRestriction, owl:Class ;
    swish:onProperties (:seatedCapacity :standingCapacity :totalCapacity) ;
    swish:constraint xsd_integer:sum .

  _:a a :PassengerVehicle ;
    :totalCapacity "57"^^xsd:integer .

Deduce no new information.

[[[More test cases as required. Include backward chaining and inference checking mode cases.]]]

TOC

References

[1]	Klyne, G. and J. Carroll, "Resource Description Framework (RDF): Concepts and Abstract Syntax", W3C LastCall WD-rdf-concepts-20031010, October 2003.
[2]	Beckett, D., "RDF/XML Syntax Specification (Revised)", W3C LastCall WD-rdf-syntax-grammar-20031010, October 2003.
[3]	Hayes, P., "RDF Semantics", W3C LastCall WD-rdf-mt-20031010, October 2003.
[4]	Brickley, D. and R. Guha, "RDF Vocabulary Description Language 1.0: RDF Schema", W3C LastCall WD-rdf-schema-20031010, October 2003.
[5]	Beckett, D., "RDF/XML Syntax Specification (Revised)", W3C Working Draft rdf-syntax-grammar, October 2003.
[6]	Brickley, D. and R. Guha, "RDF Schema", W3C Working Draft rdf-schema, October 2003.
[7]	McGuinness, D. and F. van Harmelen, "OWL Web Ontology Language Overview", W3C Candidate Recommendation owl-features, August 2003.
[8]	Berners-Lee, T., "Notation 3: Ideas about Web Architecture", 1998.
[9]	Berners-Lee, T., "CWM: A general purpose data processor for the semantic web", October 2003.
[10]	Berners-Lee, T., "Built-in functions in Cwm", May 2003.
[11]	Horrocks, I., Patel-Schneider, P., Boley, H. and S. Tabet, "A Proposal for an OWL Rules Language: Semantics and Abstract Syntax", October 2003.
[12]	Horrocks, I. and J. Pan, "Web Ontology Reasoning with Datatype Groups", 2003.
[13]	"HP Labs Semantic Web Research: Tools".
[14]	Reynolds, D., "Jena 2 Inference support", August 2003.
[15]	"Aidministrator: Sesame".
[16]	"Intellidimension: RDF gateway".
[17]	Olson , M., "RDF Inference Language (RIL)", May 2001.
[18]	"4Suite: an open-source platform for XML and RDF processing".
[19]	"Metalog: towards the Semantic Web".
[20]	De Roo, J., "Euler proof mechanism", October 2003.
[21]	"The Haskell home page".
[22]	Klyne, G., "Swish: a Semantic Web Inference Skeleton in Haskell", May 2003.
[23]	"W3C RDFcore working group".

TOC

Author's Address

	Graham Klyne
	Nine by Nine
EMail:	GK@ninebynine.org
URI:	http://www.ninebynine.net/

TOC

Appendix A. Revision history

30-Oct-2003

Memo initially created.

31-Oct-2003

Add more description of Euler and Swish.

03-Nov-2003

Added mention of Datatype Groups [12].

04-Nov-2003

Added descriptions of inference options for Swish, and selected an approach based on class restrictions

05-Nov-2003

Start adding test cases.

    $Log: RDF-Datatype-inference.html,v $
    Revision 1.1  2003/12/12 20:35:50  graham
    Add datatype inferencing notes.

    Revision 1.7  2003/11/05 15:16:46  graham
    Sync minor edits

    Revision 1.6  2003/11/05 12:53:07  graham
    Add reference to GNU Free Documentation Licence.

    Revision 1.5  2003/11/05 12:10:09  graham
    Add an initial set of test cases.

    Revision 1.4  2003/11/04 21:46:12  graham
    Editorial fix-ups

    Revision 1.3  2003/11/04 19:10:50  graham
    Added description of Swish implementation options.

    Revision 1.2  2003/10/31 01:15:13  graham
    Create new document, and add to CVS

TOC

Appendix B. Review notes and todo list

Review comments marked with [[[...]]]
Complete description of datatype inference implementation using reasoning based on class restrictions.
Add references to other RDF-based inference systems as they come to light.

Abstract

Copyright © 2003, G. Klyne, Nine by Nine

Table of Contents

1. Introduction

1.1 Terminology and conventions

2. Some existing mechanisms

2.1 CWM rules

2.2 Euler

2.3 Web Ontology Language (OWL)

2.4 DAML rules language

2.5 OWL Datatype Groups

2.6 Jena inference engine

2.7 Sesame

2.8 Intellidimension RDF Gateway

2.9 RDF Inference Language (RIL)

2.10 Metalog

3. Inference mechanisms in Swish

4. Choices for defining inference in Swish

4.1 Haskell code for each new inference rule

4.2 Rules with variable binding modifers

4.3 Rules with special properties

4.4 Constraints using variable binding modifiers

4.5 Constraints using class restrictions

5. Implementing datatype inference in Swish

6. Test cases

6.1 Test 001: forward integer sum

6.2 Test 002: backward integer sum

6.3 Test 003: all consistent values supplied

6.4 Test 004: inconsistent values supplied

6.5 Test 005: consistent values with cardinality constraints

6.6 Test 006: inconsistent values with cardinality constraints

6.7 Test 007: incomplete values supplied

References

Author's Address

Appendix A. Revision history

Appendix B. Review notes and todo list