Open Data Service, University of Southampton

Open Data Service

Our Software

Our service is cutting edge. There are many challenges in running it, not least that no third party software exists to do the sort of things we want! For this reason we have written our own, and, wherever possible, we make it available as open source in the hope that we can benefit the greater Open Data community and assist others in setting up their own Open Data services without having to re-live all the problems we've already solved.

Publishing Pipeline

Our data is republished at various intervals throughout the week using automatic scripts. The software that runs and manages these scripts is known as Hedgehog. Each dataset contains a 'hopper' directory in which the collector script runs, and Hedgehog manages the downloading of remote files, the converting of these files into linked data, as well as publishing them to the triplestore and the website and generating metadata and provenance data. It uses several supplimentary tools, the most prominent of these is Grinder, a tool for generating XML from other data formats and applying stylesheets to it (for converting to RDF/XML, for example). We also have custom tools for performing repetitive tasks such as converting spreadsheet file formats and connecting to databases.

Another essential requirement in the management of this website is the ability to convert between different RDF formats, and reason on linked data. For this we use Graphite, a PHP library that simplifies the management of linked data, and potentially allows a developer to call RDF from a triplestore without having to write a line of SPARQL. It's designed to be similar to JQuery, and is based on ARC2. For when we need to delve in and read the data directly, we have PHP-SPARQL-Lib.

External Links

We use Github to manage our development. You can download - and even contribute to - our software at the following links.

Hedgehog

https://github.com/ads04r/Hedgehog

Grinder

https://github.com/cgutteridge/Grinder

Graphite

https://github.com/cgutteridge/Graphite

PHP-SPARQL-Lib

https://github.com/cgutteridge/PHP-SPARQL-Lib

There are other tools that make our lives easier. SharePerltopus is a tool for accessing Microsoft Sharepoint from Perl. Our friends at the Open Data service of the University of Oxford have a similar tool written in Python. TripleChecker is a tool for checking for typos and common mistakes in RDF documents. We anticipate this functionality will eventually be built into Hedgehog.

SharePerltopus

https://github.com/cgutteridge/SharePerltopus

TripleChecker

https://github.com/cgutteridge/TripleChecker

Finally, the source code for this site, as well a few others based on it, is available on Github.

data.southampton.ac.uk

https://github.com/ads04r/DataSoton

data.susu.org

https://github.com/ads04r/data-susu