[ckan-dev] CSW Harvesting from GeoPortal and GeoNetwork

David Read david.read at hackneyworkshop.com
Wed Nov 7 10:24:02 UTC 2012


Ryan,

I see from the code that "Error gathering the identifiers from the CSW
server" is a problem with calling OWSLib's "getidentifiers" method,
which is the first time we use this library to call the CSW server. So
trying different versions of OWSLib as Konrad mentions may help. The
full exception gets written to the CKAN log, so you could let us know
what that says.

The version of OWSLib that ckanext-spatial tries to use is not the
original SVN repo, but an OKF branch here:
https://github.com/okfn/owslib It appears to be based on 0.3 with a
few tweaks for the servers we have been using. I see there are various
tweaks in the code already to cope with some US gov servers, which
suggests that the CSW world is not in complete agreement about the
meaning of the specs...

I'm interested to hear that 0.4 and 0.5 are out now, so would be good
to try these. I suggest you try playing with OWSLib and your CSW
servers on the command line, in a similar way to the README does for
WMS servers:
https://github.com/okfn/owslib/blob/master/README.txt

The CSW part of OWSLib was done by Tom Kralidis, who I've copied on
this, in case he can shed some light on the CSW versioning issue. I
think that it calls the CSW server in version 2.0.2, but this should
be compatible with 2.0.0 and 2.0.1. There appears to be a way to
change the version if necessary. Guidance from Tom on this would be
most helpful for all of us.

David

On 7 November 2012 02:04, Konrad Reiche <konrad.reiche at gmail.com> wrote:
> Hi Ryan,
>
> I had problems with the Error gathering the identifiers from the CSW server
> ['NoneType' object has no attribute 'find']  as well. Here is what worked
> for me:
>
> Check what version you are using with
>
> pip freeze | grep -i owslib
>
> For me OWSLib 0.4.0 works, so I suggest you uninstall your current OWSLib
> installation with
>
> pip uninstall owslib
>
> and install the 0.4.0 version from the GitHub repository:
>
> pip install -e git+https://github.com/geopython/OWSLib.git@0.4.0#egg=OWSLib
>
> When I tred the latest version 0.5-dev the error stayed the same. I am using
> CKAN 1.8
> and Harvest + Spatial [latest] as well.
>
> Best,
> Konrad
>
> Am 07.11.2012 2:40, schrieb Ryan Hodges:
>
> Hi all,
>
>
>
> I am trying to harvest spatial metadata using:
>
> Ckan [1.8]
>
> Ckanext-harvest [master]
>
> Ckanext-spatial [harvest-generic-iso]
>
>
>
> Python version 2.6
>
>
>
> Using the plugin:
>
> gemini_csw_harvester
>
>
>
> And in my config file:
>
> ckan.spatial.validator.profiles = iso19139
>
>
>
> With every site I try to harvest from, there seems to be another issue
> preventing me from succeeding. I currently am not familiar enough with the
> limitations of CSW harvesting to determine which of these are a shortcoming
> of the source, which are a current limitation of CKAN harvesting, and which
> are my own fault:
>
>
>
> When I harvest from a GeoNetwork site:
>
> ---------------------------------------------------------------
>
> url: http://apps.who.int/geonetwork/srv/csw
>
> status: Gathering errors
>
> -          Error contacting the CSW server: '2.0.2'
>
> Harvested: 0
>
> WHY: Server is at version 2.0.1. Is harvesting from this version not
> available?
>
> ---------------------------------------------------------------
>
> url: http://www.fao.org/geonetwork/srv/en/csw
>
> status: Object errors
>
> -          GUID 6fed4955-c0f4-49e6-aaf2-9475504dc6bc
>
> -          Validating against "ISO19139 XSD Schema" profile failed:
>
> -          Dataset schema (gmx.xsd) Validation Error: (u"Element
> '{http://www.isotc211.org/2005/gmd}MD_SatelliteSpatialRepresentation': This
> element is not expected. Expected is one of (
> {http://www.isotc211.org/2005/gmd}AbstractMD_SpatialRepresentation,
> {http://www.isotc211.org/2005/gmd}MD_GridSpatialRepresentation,
> {http://www.isotc211.org/2005/gmd}MD_VectorSpatialRepresentation,
> {http://www.isotc211.org/2005/gmd}MD_Georeferenceable,
> {http://www.isotc211.org/2005/gmd}MD_Georectified )., line 74",)
>
> Harvested: 10
>
> Why: As the error says, it doesn’t recognize
> MD_SatelliteSpatialRepresentation. Another error included a bad date-time.
>
> NOTE: This didn’t work at first: the directory for ISO19139 validation did
> not exist in the python 2.6 egg:
>
> ‘site-packages/ckanext_spatial-0.2-py2.6.egg/ckanext/spatial/validation/xml’
> <- Does not exist
>
> I had to copy it in manually.
>
> ---------------------------------------------------------------
>
>
>
> When I harvest from a geoportal site:
>
> ---------------------------------------------------------------
>
> url: http://gptogc.esri.com/geoportal/csw
>
> status: Gathering errors
>
> -          Error gathering the identifiers from the CSW server ['NoneType'
> object has no attribute 'find']
>
> Harvested: 0
>
> Why: ???
>
> ---------------------------------------------------------------
>
> url: http://geo.data.gov/geoportal/csw
>
> status: Gathering errors
>
> -          Error gathering the identifiers from the CSW server ['\nAn
> exception occurred with no applicable code\n']
>
> Harvested: 0
>
> Why:???
>
> ---------------------------------------------------------------
>
>
>
> If anyone knows why these might be failing, what I could do to fix them, or
> what might be fixed soon in the harvester to alleviate this, please respond.
>
>
>
> Thanks,
>
> Ryan Hodges | Applications Developer | Ecotrust
>
> 721 NW 9th Avenue, Suite 200 • Portland, OR 97209
>
> T (503) 467.0800 | F (503) 222.1517 | www.ecotrust.org
>
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>




More information about the ckan-dev mailing list