[ckan-dev] CSW Harvesting from GeoPortal and GeoNetwork

Ryan Hodges rhodges at ecotrust.org
Wed Nov 7 01:40:11 UTC 2012


Hi all,

I am trying to harvest spatial metadata using:
Ckan [1.8]
Ckanext-harvest [master]
Ckanext-spatial [harvest-generic-iso]

Python version 2.6

Using the plugin:
gemini_csw_harvester

And in my config file:
ckan.spatial.validator.profiles = iso19139

With every site I try to harvest from, there seems to be another issue preventing me from succeeding. I currently am not familiar enough with the limitations of CSW harvesting to determine which of these are a shortcoming of the source, which are a current limitation of CKAN harvesting, and which are my own fault:

When I harvest from a GeoNetwork site:
---------------------------------------------------------------
url: http://apps.who.int/geonetwork/srv/csw
status: Gathering errors

-          Error contacting the CSW server: '2.0.2'
Harvested: 0
WHY: Server is at version 2.0.1. Is harvesting from this version not available?
---------------------------------------------------------------
url: http://www.fao.org/geonetwork/srv/en/csw
status: Object errors

-          GUID 6fed4955-c0f4-49e6-aaf2-9475504dc6bc<http://ckan.labs.ecotrust.org/harvest/object/8a4a345e-1ab7-4a3a-a89f-79ff4f548961>

-          Validating against "ISO19139 XSD Schema" profile failed:

-          Dataset schema (gmx.xsd) Validation Error: (u"Element '{http://www.isotc211.org/2005/gmd}MD_SatelliteSpatialRepresentation': This element is not expected. Expected is one of ( {http://www.isotc211.org/2005/gmd}AbstractMD_SpatialRepresentation, {http://www.isotc211.org/2005/gmd}MD_GridSpatialRepresentation, {http://www.isotc211.org/2005/gmd}MD_VectorSpatialRepresentation, {http://www.isotc211.org/2005/gmd}MD_Georeferenceable, {http://www.isotc211.org/2005/gmd}MD_Georectified )., line 74",)
Harvested: 10
Why: As the error says, it doesn't recognize MD_SatelliteSpatialRepresentation. Another error included a bad date-time.
NOTE: This didn't work at first: the directory for ISO19139 validation did not exist in the python 2.6 egg:
'site-packages/ckanext_spatial-0.2-py2.6.egg/ckanext/spatial/validation/xml'  <- Does not exist
I had to copy it in manually.
---------------------------------------------------------------

When I harvest from a geoportal site:
---------------------------------------------------------------
url: http://gptogc.esri.com/geoportal/csw
status: Gathering errors

-          Error gathering the identifiers from the CSW server ['NoneType' object has no attribute 'find']
Harvested: 0
Why: ???
---------------------------------------------------------------
url: http://geo.data.gov/geoportal/csw
status: Gathering errors

-          Error gathering the identifiers from the CSW server ['\nAn exception occurred with no applicable code\n']
Harvested: 0
Why:???
---------------------------------------------------------------

If anyone knows why these might be failing, what I could do to fix them, or what might be fixed soon in the harvester to alleviate this, please respond.

Thanks,
Ryan Hodges | Applications Developer | Ecotrust
721 NW 9th Avenue, Suite 200 * Portland, OR 97209
T (503) 467.0800 | F (503) 222.1517 | www.ecotrust.org<http://www.ecotrust.org/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20121106/834a1427/attachment.html>


More information about the ckan-dev mailing list