[ckan-dev] Import cpomplex, nested metadata schema with attributes ...

Heinrich Widmann widmann at dkrz.de
Mon Feb 18 16:03:07 UTC 2013


Hi there,

in our projects we started using cKAN to harvest metadata.
We first collected meta data from several data providers - most XML 
format - on disc
and then we convert them to JSON key:value pairs and import them into CKAN
using the data set API (i.e. we send HTTP PUT requests ...)

This works fine for the already given keys "Author", "Maintainer" (and 
"State") in "Additional Information" (settings) .
"New" keys we add by  "extras" : { "newkey" : "value", ... } - I'm 
wouldn't be suprised, if this is not the appropriate way to add new keys ?

In principle we have complex, nested XML schemas with attributes and 
dependencies , e.g. something like :
<metadata xmlns="http://www.openarchives.org/OAI/2.0/" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
     <MD_Metadata xmlns="http://www.isotc211.org/2005/gmd" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" .... 
xmlns:gco="http://www.isotc211.org/2005/gco" 
xmlns:oai="http://www.openarchives.org/OAI/2.0/" 
xmlns:iso="http://www.isotc211.org/2005/gmd" 
xsi:schemaLocation="http://www.isotc211.org/2005/gmd 
http://www.isotc211.org/2005/gmd/metadataEntity.xsd" 
id="de.dkrz.mpim.iso20209">
    <fileIdentifier>
<gco:CharacterString>de.dkrz.mpim.iso20209</gco:CharacterString>
    </fileIdentifier>
     ......................

If you convert this by a simple xml2json converter to a flat key:value 
schema, you get something like :

  {"{http://www.openarchives.org/OAI/2.0/}metadata": 
{"{http://www.isotc211.org/
2005/gmd}MD_Metadata": 
{"{http://www.isotc211.org/2005/gmd}dataQualityInfo": {
"{http://www.isotc211.org/2005/gmd}DQ_DataQuality": .............

Which can be imported in CKAN - but not in a structured, "resolved" and 
searchable key : value form.

Thanks,
Heinrich


-- 
-----------------------------\\---------------------------------------
Heinrich Widmann              \\ Deutsches Klimarechenzentrum GmbH
Phone: +49 40 41173 282        \\   Abteilung Datenmanagement
FAX:   +49 40 41173 476         \\    Bundesstr. 45a
Email: widmann at dkrz.de           \\   D-20146 Hamburg
http://www.dkrz.de                \\  Germany
-----------------------------------\\---------------------------------





More information about the ckan-dev mailing list