[open-bibliography] Dataset Metadata
Christopher Gutteridge
cjg at ecs.soton.ac.uk
Wed Oct 20 08:39:53 UTC 2010
First of all, just to make sure people have heard of voID
http://semanticweb.org/wiki/VoiD ((from "Vocabulary of Interlinked
Datasets") is an RDF based schema to describe linked datasets.) It
sounds like it has a heavy overlap with dcat.
Secondly, unless I've missed it there's something missing from both and
I don't think it can be usefully dealt with by schema, but rather
requires convention.
That is to tell people what they can expect to get back if they resolve
the URI of the dataset, or the URIs of elements within the dataset. What
I'm proposing is that people should mint RDF classes which describe a
certain pattern or structure. If that pattern is not specific to their
archive, they should ideally use a neutral domain for it so that it can
be reused by other orgs. without stigma.
I plan to write a blog post on this soon, but by way of example, here's
the basic ontology for an EPrints repository:
http://www.eprints.org/ontology/ -- you'll see that I indicate what
you'll get back if you resolve a URI of class EPrint or Repository. This
is slightly semantically shonky as it describes a property of the URI
and not the concept represented by the URI. sameAs does not apply in
this case!
While some datasets may be global, and worth the time and effort to
build custom interfaces for, others are not. For smaller and local
datasets, such as a bibliography, its better to indicate what standard
pattern you are using.
A) For example, the most simple would be (I'm guessing) a pattern where
the bibliography is a single RDF+XML document containing an unordered
collection of records containing flat dublin-core metadata.
B) The other simple one is a URI which, if resolved, will return an
RDF+XML document containing a bunch of triples relating that URI to a
set of bibligraphic records held on other systems, simple dublin core
may be included, but should not be relied on. rdf:type may be available
to indicate what the records may be, or the pattern(s) they fulfill, but
again optionally.
Basically, it would be really useful for a consuming application to know
if you're dealing with an (A) or (B). If (B) then you probably need to
resolve all the URIs, in the case of A this is pointless. This may not
be a big deal on a single list of 10 items, but machine readable cues
about the value of resolving linked data URIs will be important as
systems scale.
--
Christopher Gutteridge -- http://id.ecs.soton.ac.uk/person/1248
/ Lead Developer, EPrints Project, http://eprints.org/
/ Web Projects Manager, ECS, University of Southampton, http://www.ecs.soton.ac.uk/
/ Webmaster, Web Science Trust, http://www.webscience.org/
More information about the open-bibliography
mailing list