[ckan-discuss] Harvesting Dublin Core documents

William Waites ww at eris.okfn.org
Wed Nov 24 16:02:43 GMT 2010


* [2010-11-24 15:45:35 +0000] John Bywater <john.bywater at appropriatesoftware.net> écrit:


] Although we had (and still have!) XSLT codes to transform GEMINI 
] documents to Dublin Core, when I was implementing the GEMINI harvesting, 
] I was given a large set of XPaths that meant I could pick out CKAN 
] Package attributes values from a GEMINI document. The XPaths were the 
] basis of an extended harvesting discussion with the client, so I decided 
] to keep things simple, and not to use the XSLT.
] 
] Hence, CKAN still isn't able to harvest Dublin Core documents. I'd like 
] to fix this. Hopefully, the only missing piece is the set of XPaths for 
] picking out CKAN Package attributes values from a Dublin Core document.
] 
] Could we try to identify these XPaths? Those for GEMINI are here:
] http://ckan.org/browser/ckan/model/harvesting.py#L588

John, I'd like to reframe the question. What the XSLT transform did
was transform ISO19139 to RDF/XML. The Dublin Core vocabulary was used
for some elements therein, other vocabularies were also used (this was
well before the DCat vocabulary - DCat is basically some extensions to
DC for catalogues).

Attempting to identify XPaths is a category error - RDF/XML is only
one serialisation and even within just that one the same data can be
represented in any number of ways. Rather what you do is read the RDF
and then process it programmatically either with SPARQL or directly
according to the library or bindings that you are using (in our case,
rdflib).

We do have a requirement for LOD2 to have CKAN be able to ingest RDF
data using DCat, it has even be suggested that this be the native
interchange format between CKAN instances for aggregation.

So, without touching the codebase of CKAN itself, the task would seem
to be to take a DCat document (examples [1][2][3]) and use the API to
create/update packages.

Sound reasonable?

-w

[1] http://semantic.ckan.net/package/aprsworld.n3
[2] http://semantic.ckan.net/package/econ-alfred.rdf
[3] http://catalogue.data.gov.uk/doc/dataset/asbo_counts.nt
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664



More information about the ckan-discuss mailing list