[ckan-dev] Importing ISO19139 from GeoNetwork via CSW

Bruce Crevensten becrevensten at alaska.edu
Thu Sep 27 01:37:07 UTC 2012


Hi Adrià (and others!),

I'd posted a question on the ckan-discuss list regarding importing
ISO19139 records from GeoNetwork via CSW, and I'm posting to the dev
list now because I'm starting to look more deeply at the code that is
relevant, but currently not working for me.

My configuration file (development.ini) has these settings:

ckan.plugins = stats harvest ckan_harvester inspire_api
gemini_harvester gemini_doc_harvester gemini_waf_harvester

ckan.inspire.validator.profiles = iso19139

My harvester job is set up to be type 'csw', and the URL endpoint is
this: http://athena.snap.uaf.edu:8080/geonetwork/srv/en/csw

My initial difficulty was that the Gemini2 profile was being used to
validate the records, and that wasn't working.  Adrià directed me to
check out distinct branches of the ckanext-inspire and
ckanext-harvester extensions, as follows:

* ckanext-inspire: git checkout harvest-generic-iso
* ckanext-csw: git checkout generic-iso-support

When I ran the import again (stopping/restarting all fetch/gather
processes, etc) I got a log error on the fetching phase:

2012-09-26 16:30:11,996 ERROR [ckanext.inspire.harvesters] Error
getting the CSW record with GUID f620e331-6322-4896-894d-d27d8da0ab8f

After adding a bit more logging to the
ckanext-inspire/ckanext/inspire/harvesters.py file, I found that the
error was originating in the oswlib package in my virtualenv, in the
csw code: it stated that "gmi" wasn't a valid namespace.

I confirmed that this namespace was missing from the csw package, and
tried hacking it into the 'namespaces' data structure in csw.py, but
it's throwing this exception:

'<ows:ExceptionReport xmlns:ows="http://www.opengis.net/ows"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0.0"
xsi:schemaLocation="http://www.opengis.net/ows
http://schemas.opengis.net/ows/1.0.0/owsExceptionReport.xsd">\n
<ows:Exception exceptionCode="InvalidParameterValue"
locator="outputSchema">\n
<ows:ExceptionText>http://www.isotc211.org/2005/gmi</ows:ExceptionText>\n
 </ows:Exception>\n</ows:ExceptionReport>'

I think I'm off track.  I'm not entirely sure why the process depends
the gmi namespace, since a CSW request that I think is valid (below)
doesn't seem to reference that namespace in the resulting document:

http://athena.snap.uaf.edu:8080/geonetwork/srv/en/csw?request=GetRecordById&service=CSW&version=2.0.2&elementSetName=full&id=4edfbeef-f830-4ce7-b6b1-557592ea8dce

I haven't yet checked GeoNetwork's server logs to track down the exact
requests that the harvester process is making, that's a reasonable
next step to see if my endpoint is invalid.

Am I encountering a known limitation of the harvester (not all
namespaces/CSW sources can be harvested), or does this seem like a
misconfiguration or other error on my part?

(And, thank you all for your collective work on CKAN, the various
extensions, and your helpful replies to my question on the
ckan-discuss list.  This is an impressive project, and I am glad to
have encountered it.)

- Bruce

--

Bruce Crevensten, Web Programmer
Scenarios Network for Alaska & Arctic Planning
3352 College Road, 2nd Floor Denali Building
Fairbanks, AK 99709
Phone: 907-474-7134
Fax: 907-474-7151
www.snap.uaf.edu
becrevensten at alaska.edu




More information about the ckan-dev mailing list