[open-bibliography] Dataset Metadata

Wed Oct 20 08:39:53 UTC 2010

First of all, just to make sure people have heard of voID 
http://semanticweb.org/wiki/VoiD ((from "Vocabulary of Interlinked 
Datasets") is an RDF based schema to describe linked datasets.) It 
sounds like it has a heavy overlap with dcat.

Secondly, unless I've missed it there's something missing from both and 
I don't think it can be usefully dealt with by schema, but rather 
requires convention.

That is to tell people what they can expect to get back if they resolve 
the URI of the dataset, or the URIs of elements within the dataset. What 
I'm proposing is that people should mint RDF classes which describe a 
certain pattern or structure. If that pattern is not specific to their 
archive, they should ideally use a neutral domain for it so that it can 
be reused by other orgs. without stigma.

I plan to write a blog post on this soon, but by way of example, here's 
the basic ontology for an EPrints repository:  
http://www.eprints.org/ontology/ -- you'll see that I indicate what 
you'll get back if you resolve a URI of class EPrint or Repository. This 
is slightly semantically shonky as it describes a property of the URI 
and not the concept represented by the URI. sameAs does not apply in 
this case!

While some datasets may be global, and worth the time and effort to 
build custom interfaces for, others are not. For smaller and local 
datasets, such as a bibliography, its better to indicate what standard 
pattern you are using.

A) For example, the most simple would be (I'm guessing) a pattern where 
the bibliography is a single RDF+XML document containing an unordered 
collection of records containing flat dublin-core metadata.

B) The other simple one is a URI which, if resolved, will return an 
RDF+XML document containing a bunch of triples relating that URI to a 
set of bibligraphic records held on other systems, simple dublin core 
may be included, but should not be relied on. rdf:type may be available 
to indicate what the records may be, or the pattern(s) they fulfill, but 
again optionally.

Basically, it would be really useful for a consuming application to know 
if you're dealing with an (A) or (B). If (B) then you probably need to 
resolve all the URIs, in the case of A this is pointless. This may not 
be a big deal on a single list of 10 items, but machine readable cues 
about the value of resolving linked data URIs will be important as 
systems scale.

-- 
Christopher Gutteridge -- http://id.ecs.soton.ac.uk/person/1248

/ Lead Developer, EPrints Project, http://eprints.org/
/ Web Projects Manager, ECS, University of Southampton, http://www.ecs.soton.ac.uk/
/ Webmaster, Web Science Trust, http://www.webscience.org/