[ckan-discuss] Harvesting Dublin Core documents

John Bywater john.bywater at appropriatesoftware.net
Wed Nov 24 16:12:29 GMT 2010


William Waites wrote:
> * [2010-11-24 15:45:35 +0000] John Bywater <john.bywater at appropriatesoftware.net> écrit:
> 
> 
> ] Although we had (and still have!) XSLT codes to transform GEMINI 
> ] documents to Dublin Core, when I was implementing the GEMINI harvesting, 
> ] I was given a large set of XPaths that meant I could pick out CKAN 
> ] Package attributes values from a GEMINI document. The XPaths were the 
> ] basis of an extended harvesting discussion with the client, so I decided 
> ] to keep things simple, and not to use the XSLT.
> ] 
> ] Hence, CKAN still isn't able to harvest Dublin Core documents. I'd like 
> ] to fix this. Hopefully, the only missing piece is the set of XPaths for 
> ] picking out CKAN Package attributes values from a Dublin Core document.
> ] 
> ] Could we try to identify these XPaths? Those for GEMINI are here:
> ] http://ckan.org/browser/ckan/model/harvesting.py#L588
> 
> John, I'd like to reframe the question. What the XSLT transform did
> was transform ISO19139 to RDF/XML. The Dublin Core vocabulary was used
> for some elements therein, other vocabularies were also used (this was
> well before the DCat vocabulary - DCat is basically some extensions to
> DC for catalogues).
> 
> Attempting to identify XPaths is a category error - RDF/XML is only
> one serialisation and even within just that one the same data can be
> represented in any number of ways. Rather what you do is read the RDF
> and then process it programmatically either with SPARQL or directly
> according to the library or bindings that you are using (in our case,
> rdflib).
> 
> We do have a requirement for LOD2 to have CKAN be able to ingest RDF
> data using DCat, it has even be suggested that this be the native
> interchange format between CKAN instances for aggregation.
> 
> So, without touching the codebase of CKAN itself, the task would seem
> to be to take a DCat document (examples [1][2][3]) and use the API to
> create/update packages.
> 
> Sound reasonable?
> 

Sounds very reasonable. ;-)

J.

> -w
> 
> [1] http://semantic.ckan.net/package/aprsworld.n3
> [2] http://semantic.ckan.net/package/econ-alfred.rdf
> [3] http://catalogue.data.gov.uk/doc/dataset/asbo_counts.nt




More information about the ckan-discuss mailing list