[ckan-discuss] Harvesting Dublin Core documents

Wed Nov 24 19:26:41 GMT 2010

Hi Will,

Just a quick thought about aggregation behaviour.

William Waites wrote:
> it has even be suggested that this be the native
> interchange format between CKAN instances for aggregation.
> 

I'm not exactly sure what is being meant by "the native interchange 
format", but I was wondering whether it would benefit CKAN users for 
CKAN to support aggregating a heterogeneity of metadata formats.

That is, CKAN could ingest a document in one of many supported formats, 
write a package with values it reads from the document, and then keep a 
copy of the ingested document.

When aggregating, if a package has been written from an ingested 
document, server-CKAN could present the metadata records for aggregation 
in their original form, and client-CKAN could write a package as if 
document had been ingested for the first time. That would pretty much 
guarantee lossless transmission through a chain, if there ever was such 
a thing.

For locally edited package (which would have no ingested document) I 
would think that directly passing the native JSON format for locally 
edited packages would be the simplest thing to do. Version differences 
are already supported via the versioning of the API. You could even put 
the aggregator outside the API, reading from one API and writing to 
another. (I should admit, I don't yet see what would prompt me to do 
that with DCat?)

Of course, we could (somewhere) add more support for presenting CKAN 
packages in different formats. We could present everything as one format 
or another, but it would be tricky to have a lossless homogenisation. So 
that might not be the way to do aggregation.

As I said, I'm not sure what is meant by "the native interchange 
format", and I'm certainly not the world's expert on DCat, but I hope 
these considerations were at least interesting to read. :-)

Best wishes,

John.