Open Data Service, University of Southampton

Open Data Service

Frequently Asked Questions

Frequently Asked Questions

If you have a question not in the list, please get in touch.

Why does the site occasionally break?
We like to think of this site as being in "beta". By this we mean that this service is in active development. We are learning as we go so it is subject to having bits added and changed. That's not a good way to build a suspension bridge, but it is very cost-effective and practical for web development. When we started this site we had a brief, rather than a detailed set of requirements.
Some of the pages on the site just look like a badly organised list of stuff. Why is that?
The site is entirely data-driven. There are some pages, such as this one, that are entirely hand-written by humans, but most of the pages on the site are automatically generated from raw data. We've tried as much as possible to provide a friendly viewer for as much of the different data types as possible, or at the very least provide an explanation, but occasionally this may not work quite as well as we expect. That said, we're trying to make the site as human-readable as possible, so if there are any confusing pages then please do get in contact and let us know.
What is a URI?
First of all, note that it looks very like "URL". URLs are a subset of URIs. A URI identifies a single concept, but unlike a URL that concept is not limited to a document or file on the Web. It still looks like a web address, but don't let that confuse you. It can identify things which are not possible to turn to a series of ones and zeroes and send over the Internet. For example, Bencraft Hall Bar (http://id.southampton.ac.uk/building/81D), the Genus Velociraptor (http://www.bbc.co.uk/nature/genus/Velociraptor) and so forth. Most University of Southampton URIs will start with "http://id.southampton.ac.uk/".
What formats are you publishing in?
Where possible we aim to provide the data as full ★★★★★ data. However, that may be a long process, and we would rather make good data available now, than perfect data the day after tomorrow. Most datasets are available as RDF+XML and Turtle and, where possible, we also provide the raw data which is almost always one or more comma-separated value files or Excel documents.
What file formats are used on this site?
Many of the HTML pages are constructed from RDF data. If the page was constructed from triples, the "get the data" box will contain links to other alternate formats. The formats RDF+XML (.rdf), Turtle (.ttl) and N-Triples (.nt) all express exactly the same data. For a programer new to RDF, "N-Triples" is the easiest to get to grips with as it's just the raw data. Depending on the data, we may also provide other formats, such as RSS for news or event feeds, Google KML files for geospacial data, or ICS files for temporal data.
What technologies are you using?
We use Grinder, rapper, and some simple scripts to publish the datasets. Grinder was developed for this project, but is available as a simple way to convert tabular data into RDF. We use 4store as our data store and SPARQL endpoint, with a arc2 front page to make it easier to use. To provice the interface we use PHP and our Graphite and sparqllib libraries, plus Open Street Map for the maps.
Are you the only university to do this?
No, and it would be pointless if we were. We are working with our peers at other universities planning and providing open data to design good practice and tools. However Southampton is a pioneer in the field, leading in Open Access to Research and Electronics and Computer Science have been publishing open data about their infrastructure since 2006. This site is built by the same team, but with the experience of the lessons learned from the ECS project.
What license will you publish under?
Datasets will mostly be published under Open Government License, or other licenses conforming to the Open Definition. In some cases this may not be possible.
Aren't you worried about the legal and social risks of publishing your data?
No, we are not worried. We will consider carefully the implications of what we are publishing and manage our risk accordingly. We have no intention of breaking the UK Data Protection Act or other laws. Much of what we publish is going to be data which was already available to the public, but just not as machine-readable data. There are risks involved, but as a university it's our role to try exciting new things!