[ckan-discuss] Harvesting Dublin Core documents

William Waites ww at eris.okfn.org
Fri Nov 26 12:26:07 GMT 2010

* [2010-11-24 19:26:41 +0000] John Bywater <john.bywater at appropriatesoftware.net> écrit:

] >it has even be suggested that this be the native
] >interchange format between CKAN instances for aggregation.
] I'm not exactly sure what is being meant by "the native interchange 
] format", but I was wondering whether it would benefit CKAN users for 
] CKAN to support aggregating a heterogeneity of metadata formats.

Perhaps I should rephrase this to "the native interchange format for
LOD2", where we can hope to have CKAN instances and other software
chattering away about what datasets are out there and being aggregated
and searched etc.

] When aggregating, if a package has been written from an ingested 
] document, server-CKAN could present the metadata records for aggregation 
] in their original form, and client-CKAN could write a package as if 
] document had been ingested for the first time. That would pretty much 
] guarantee lossless transmission through a chain, if there ever was such 
] a thing.

What about the search-engine-as-aggregator strategy? Sort of the way
one might submit a URI to sindice when something has changed or use
some sort of pingback (semantic or otherwise) mechanism. This is more
a push than pull model though.

] For locally edited package (which would have no ingested document) I 
] would think that directly passing the native JSON format for locally 
] edited packages would be the simplest thing to do. Version differences 
] are already supported via the versioning of the API. You could even put 
] the aggregator outside the API, reading from one API and writing to 
] another. (I should admit, I don't yet see what would prompt me to do 
] that with DCat?)

Where we want to start having namespaced metadata. (the standard way
to do this is RDF but I can easily imagine another convention). A
slightly contrived example: CKAN instance A has a convention of using
an extra called "geography" to refer to something about the spatial
extent of a dataset (corresponding to dc:spatial). CKAN instance B has
a convention of using an extra called "geography" to mean that the
dataset is used as course material in the Geography Department (maybe
corresponding to dcat:keyword). Before aggregation can happen these
conventions need to be normalised in some way. The place the
information necessary to do this resides is close to the CKAN
instances. And once normalised / formalised one might find that the
CKAN-JSON format is insufficient or inconvenient to encode the
metadata for transmission.


William Waites
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

More information about the ckan-discuss mailing list