TOC |
|
This memo describes the software package Swish, which is being developed to conduct experiments in using the programming language Haskell as a basis for performing inference on Semantic Web data.
$Id: swish-0.1.html,v 1.3 2003/06/03 16:22:57 graham Exp $
TOC |
TOC |
This memo describes Swish, a software package that is intended to provide a framework for building programs in Haskell that perform inference on RDF [1] data.
Swish is primarily intended to be used as a starting point for developing new RDF applications in Haskell, but it also includes a stand-alone program that can be used to perform some simple manipulation of RDF data. Currently, only the Notation 3 [4] serialization form is supported, but I'd like to add support for full RDF/XML in due course.
The software can be downloaded from the web by following links from http://www.ninebynine.org/Software/Intro.html. This software is made generally available under the GNU General Public Licence (GPL), version 2 (or later), a full copy of which is available at http://www.gnu.org/licenses/gpl.html, and also as file GPL.TXT in the Swish software distribution. Other licensing arrangements may be negotiated with the author: see LICENSING.TXT in the Swish software distribution.
TOC |
The Swish software package was developed as a result of experience using CWM [5], an off-the-shelf general-purpose RDF processing program, for evaluating simple inference rules on network access control information expressed in RDF [19]. Specifically, while the general inference capabilities of CWM were almost sufficient for the network access application, some capabilities were required that are unlikely to be provided by any completely general-purpose tool; e.g. analysis of IP network addresses by subnet and host address.
Additionally, the framework for datatyped literals currently proposed by the RDFcore working group [8] is quite open-ended, and it is not specified how generic applications may provide support for new or non-standard datatypes.
In light of these considerations, I sought ways of combining the full expressive capability of a general purpose programming language with the declarative style of inference rules and formal specifications. Using Haskell [11], a pure functional programming language, is the approach adopted.
To use Haskell as a basis for performing inference on RDF data, certain capabilities are neded:
Swish aims to provide these capabilities. Further, it provides capabilities to compare RDF graphs (insensitive to possible renaming of blank nodes), and to merge RDF graphs, renaming blank nodes as necessary to prevent unintended semantic consequences [9].
I anticipate that the main use for Swish will be as a support library for new utilities that apply predefined RDF inference rules. Where CWM is a general-purpose tool for manipulating RDF, I expect to use Swish as a toolkit for creating tools to deal with specific RDF processing requirements. In time, this may lead to identification of some useful capabilities that can guide the design of future general-purpose RDF processing tools.
The programming language Haskell was chosen for a number of reasons:
More information about Haskell can be found at [11]. A useful paper discussing some particular characteristics of functional programming languages is [17].
TOC |
Swish comprises a number of modules that can be invoked by Haskell programs, and a stand-alone command-line utility that can be used to perform some basic processing of RDF data.
The Haskell source code for the stand-alone utility may also be used as a starting point for similar utilities that perform specific application processing of RDF data.
Swish has the following components:
Currently, the only supported RDF graph serialization format is Notation3, but future developments may add support for other formats. RDF/XML would clearly be most desirable. Meanwhile, utilities such as CWM [5] can be used convert RDF/XML to and from Notation 3 format.
The Swish utility is a command-line utility that performs some simple RDF processing functions. The capabilities provided are with a view to testing the underlying RDF library software rather than performing any particular application purpose.
A Swish command contains a one or more command line options that are processed from left-to-right. The Swish program maintains an internal graph workspace, which is updated or referenced as the command options are processed.
Swish command options:
- -?
- Displays a summary of the command line options.
- -n3
- Indicates that Notation3 be used for subsequent input and output. (Currently, this is the only format option, and is selected by default.)
- -i[=file]
- read file into the graph workspace, replacing any existing graph. If the filename is omitted, the graph is read from standard input.
- -m[=file]
- read and merge file with the graph workspace. Blank nodes in the input file are renamed as necessary to avoid node identifiers already used by the existing graph. If the filename is omitted, the graph is read from standard input.
- -c[=file]
- read file and compare the resulting graph with the workspace. Graph comparison is done in a fashion that treats isomorphic graphs as equivalence, and is insensitive to renaming of blank nodes. This is intended to match the definition of graph equivalence in the RDF abstract syntax specification [10]. If the filename is omitted, the graph is read from standard input. If the graphs are unequal, the exit status code is 1.
- -o[=file]
- write the graph workspace to a file. If the filename is omitted, the graph is written to the standard output.
The Swish program terminates with a status code that indicates the final status of the operation(s) performed. Haskell distinguishes between a success status code whose value is not specified, assumed to be system dependent, and a failure code which is associated with an integer value. The status code values returned by Swish are:
- Success
- Operation completed successfully; graphs compare equal.
- 1
- Graphs compare different.
- 2
- Input data file incorrect format.
- 3
- File access problem.
- 4
- Incorrect option in command line.
Here are some example Swish command lines:
[[[To be provided; until then see the source files, notably SwishCommands.hs, GraphClass.hs, RDFGraph.hs, and the various test modules.]]]
TOC |
The Swish software is distributed as a single ZIP archive. Start installation by creating an empty directory for the software, and extracting the content of the ZIP archive into that directory. Select the ZIP option that uses directory information from the archive so that the sub-directory structure is preserved.
The following sections deal with how get get the software running in different Haskell environments. The instructions relate to MS Windows operating systems, but it should be fairly obvious how to adapt the procedures for Unix/Linux systems.
Swish is written entirely in Haskell, and should work with any Haskell system that supports Haskell 98 together with the extensions noted below. The software has been tested using Hugs [12] (version November 2002), Glasgow Haskell Compiler (GHC) [13] (version 5.04.3) and the interactive version of GHC (GHCi).
The required extensions to standard Haskell-98 are:
Some freely available additional Haskell libraries are used, as described later. For convenience, these are included with the Swish software distribution, but are not themselves part of the Swish software for licensing purpose. More details are given later.
My development has been performed mostly using Hugs on a 1.3GHz PC with 256Mb of memory. For most purposes, this has been more than adequate. Some of the larger test cases, and the more perverse graph comparisons, may take several minutes to run on this platform (SwishTest takes about 20 minutes). In practice, the applications are likely to be more demanding than basic requirements of Swish.
The Swish software distribution includes the following files:
- Install directory
- Swish.html, Swish.xml: this documentation file, and XML source code.
- *.hs: Haskell source files (see software overview above).
- *Test.hs: unit test Haskell source files.
- *.bat: MS-Windows command files for building and testing the software using GHC.
- *.txt: additional information, including licensing details.
- Data subdirectory
- Contains Notation3 data files used by the SwishTest program.
- Parsec subdirectory
- Contains the Parsec library used by Swish.
- HUnit subdirectory
- Contains the HUnit library used by Swish test modules.
- Sort subdirectory
- Contains the Quicksort library used by Swish. (References to this module can be removed, and the standard Haskell function List.sort used in place of QuickSort.)
Running the Swish software under Hugs is straightforward. The Hugs options -98 and +N must be specified.
Special steps that might help include:
runSwish "command options"
noting that the command line must be supplied as a Haskell string expression (e.g. in double quotes).
The full settings reported by my Hugs installation are:
Current settings: +fewuiRWXN -stgGl.qQkoOIHT -h5000000 -p"%s> " -r$$ -c40 Search path : -P{Hugs}\libraries;{Hugs}\libraries\HUnit; {Hugs}\libraries\Parsec;{Hugs}\libraries\Sort Project Path : Source suffixes : -S.hs;.lhs Editor setting : -E"C:\\Program Files\\TextPad 4\\TextPad.exe" Preprocessor : -F Compatibility : Hugs Extensions (-98)
Running the Swish software under GHCi is almost as easy as using Hugs. GHCi command line options used include '-fglasgow-exts' and '-iF:\Haskell\Lib\HUnit;F:\Haskell\Lib\Parsec;F:\Haskell\Lib\Sort' (adjusted according to the directories containing the library files). Working under MS-Windows, I find it convenient to create a desktop shortcut to run GHCi, specifying the Swish source directory and other options as properties of the shortcut.
To run a program in the GHCi command interpreter, follow the same procedure that is described for running a program under Hugs. The GHCi and Hugs command shells are very similar.
There is a GHCi initialization file '.ghci' that if placed in the appropriate startup directory is read automatically by GHCi and defines some convenient commands for running the non-interactive GHC compiler from within the GHCi shell.
MS-Windows command scripts have been prepared to compile and run the Swish software in an MS-Windows command window. It should be straightforward to create Unix equivalents using information from these. The relevant files are:
The file ghcc.bat assumes a standard GHC installation, with the GHC compiler is on the current search path, and will probably need to be edited to refelct the actual locations of the support libraries used.
Once the programs have been compiled and linked, they can be run in the usual way by using entering the program name at a command prompt. The test programs do not expect any command line options and run to completion. The program Swish.exe takes command line options as descriped above.
A Swish installation under GHC can be tested by running the command script TestSwish.bat, and ensuring that all tests complete with zero errors. On a 1.7GHz PC running Windows 2000, the tests take a few minutes to complete.
To test the installation from an interactive shell, the test programs need to be loaded and executed individually. To confirm a successful installation, it is probably sufficient to load and run RDFGraphTest, which should complete quite quickly (about 30 seconds under Hugs on a 1.3GHz PC), then run SwishTest which takes about 20 minutes on the same system.
Swish uses some additional libraries that are not part of the swish software, but which are included with the Swish software distribution for the convenience of users.
Please note that these support libraries are distributed under their own licensing terms and conditions, which I have reproduced below where available. Please contact the respective authors for further information.
Parsec [14] is a monadic parser combinator library for Haskell. I found it to be excellently documented and generally easy to use. It also serves as a useful introduction, to using monads in Haskell.
Copyright 1999-2000, Daan Leijen. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
This software is provided by the copyright holders "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright holders be liable for any direct, indirect, incidental, special, exemplary, or consequential damages ( including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
Quicksort is part of a collection of sorting functions in haskell, published by Ralf Hinze [16].
At the time of writing, I can find no claim for copyright or distribution licensing terms.
HUnit [15] is a unit testing framework for Haskell, loosely modelled on the JUnit framework that is popular with Java programmers.
Swish application code does not use HUnit, but the test programs do make extensive use of it.
HUnit is Copyright (c) Dean Herington, 2002, all rights reserved, and is distributed as free software under the following license.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT ( INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
TOC |
Swish is very much a work-in-progress, and the present release is a first step along a path with many possible options for future developments.
The immediate next step is to use the Swish framework in the construction of some special-purpose RDF inference tools. My intent is to revisit my earlier work [19], and learn how that work may be served by using Haskell in place of a packaged RDF inference tool.
The Swish code itself is far from perfect, and there is much additional functionality and improvement that can be made. But it does pass an extensive array of tests, and I believe it is sufficiently stable and functional for this initial release.
The software distribution contains a file named TODO.TXT, which lists a number of specific possible enhancements that have been identified to date.
TOC |
I would like to thank the following, whose previous work has been most helpful to me (though, of course, they bear no responsibility for the failings of my work):
This document has been authored in XML using the format described in RFC 2629 [3], and converted to HTML using the XML2RFC utility developed by Marshall Rose (http://xml.resource.org/).
TOC |
TOC |
Graham Klyne | |
Nine by Nine | |
14 Chambrai Close | |
Appleford | |
Abingdon, Oxon OX14 4NT | |
UK | |
Phone: | +44 1235 848491 |
Fax: | +44 1235 848562 |
EMail: | GK-swish@ninebynine.org |
URI: | http://www.ninebynine.net/ |
TOC |
- 2003-05-30:
- Document initially created.
TOC |
TOC |
$Log: swish-0.1.html,v $ Revision 1.3 2003/06/03 16:22:57 graham Typo fixes to web site CVS Revision 1.6 2003/06/03 16:17:44 graham Fix another typo Revision 1.5 2003/06/03 11:31:13 graham Fix typos in documentation Revision 1.4 2003/05/31 00:11:21 graham Fix various typos and omissions Revision 1.3 2003/05/30 19:12:57 graham Fixed some document typos and added RDF semantics reference Revision 1.2 2003/05/30 18:37:25 graham First formatted version of Swish documentation Revision 1.1 2003/05/30 16:41:22 graham Swish documentation, initial version.