[ckan-dev] CSW Harvesting from GeoPortal and GeoNetwork

Adrià Mercader adria.mercader at okfn.org
Wed Nov 7 15:46:39 UTC 2012


Hi,

Thanks all for all the feedback and pointers.

Before answering specific issues, let me stress again what I mentioned
in earlier threads: support for harvesting generic CSW sources is a
feature currently under development and bound to need some significant
work to get it working as is. It hasn't been thoroughly tested and the
harvester code can be complex, with errors returned are sometimes
obscure, both these things we are working to improve (literally right
now). As more different servers are tested, each with its own issues,
more problems will no doubt surface, so it will take some time until
this is stable enough for production.

Ryan, David, Tom see comments below:

>> > Am 07.11.2012 2:40, schrieb Ryan Hodges:
>> > And in my config file:
>> >
>> > ckan.spatial.validator.profiles = iso19139

>> > url: http://apps.who.int/geonetwork/srv/csw
>> >
>> > status: Gathering errors
>> >
>> > - Error contacting the CSW server: '2.0.2'
>> >
>> > Harvested: 0
>> >
>> > WHY: Server is at version 2.0.1. Is harvesting from this version not
>> > available?
CKAN relies on OWSLib for querying the CSW servers, and if as Tom
mentioned, it only supports CSW 2.0.2, that's what CKAN will support.


>> > url: http://www.fao.org/geonetwork/srv/en/csw
>> >
>> > status: Object errors
>> >
>> > - GUID 6fed4955-c0f4-49e6-aaf2-9475504dc6bc
>> >
>> > - Validating against "ISO19139 XSD Schema" profile failed:
>> >
>> > - Dataset schema (gmx.xsd) Validation Error: (u"Element
>> > '{http://www.isotc211.org/2005/gmd}MD_SatelliteSpatialRepresentation':
>> > This
>> > element is not expected. Expected is one of (
>> > {http://www.isotc211.org/2005/gmd}AbstractMD_SpatialRepresentation,
>> > {http://www.isotc211.org/2005/gmd}MD_GridSpatialRepresentation,
>> > {http://www.isotc211.org/2005/gmd}MD_VectorSpatialRepresentation,
>> > {http://www.isotc211.org/2005/gmd}MD_Georeferenceable,
>> > {http://www.isotc211.org/2005/gmd}MD_Georectified )., line 74",)
>> >
>> > Harvested: 10
>> >
>> > Why: As the error says, it doesn’t recognize
>> > MD_SatelliteSpatialRepresentation. Another error included a bad
>> > date-time.
This is just a validation error for this document, which does not
adhere to the ISO 19193 XSD validation. The current harvesting
implementation will not prevent the package from being created, but
you can try other ISO validation profiles to see if the errors go
away. There are profiles for the NGDC (iso19139ngdc) and EDEN
(iso19139eden) schemas available.


>> > NOTE: This didn’t work at first: the directory for ISO19139 validation
>> > did
>> > not exist in the python 2.6 egg:
>> >
>> >
>> > ‘site-packages/ckanext_spatial-0.2-py2.6.egg/ckanext/spatial/validation/xml’
>> > <- Does not exist
>> >
>> > I had to copy it in manually.
How did you install ckanext-spatial? Was it from sources?


>> > url: http://gptogc.esri.com/geoportal/csw
>> >
>> > status: Gathering errors
>> >
>> > - Error gathering the identifiers from the CSW server ['NoneType'
>> > object has no attribute 'find']
I'm not sure if this is exactly the same issue, but I found a similar
one that needed fixing on OWSLib [1].
Feel free to try my fork with the changes to see if that's solves the issue




>> > url: http://geo.data.gov/geoportal/csw
>> >
>> > status: Gathering errors
>> >
>> > - Error gathering the identifiers from the CSW server ['\nAn
>> > exception occurred with no applicable code\n']
For some reason, the remote server returns an exception while
gathering the indentifiers. We'd need to investigate further on this.


On 7 November 2012 10:24, David Read <david.read at hackneyworkshop.com> wrote:
> Ryan,
>
> I see from the code that "Error gathering the identifiers from the CSW
> server" is a problem with calling OWSLib's "getidentifiers" method,
> which is the first time we use this library to call the CSW server.
Just to be clear, getindentifiers is a CKAN method (csw_client.py)
which in turns calls OWSLib's csw.getrecords.


> The version of OWSLib that ckanext-spatial tries to use is not the
> original SVN repo, but an OKF branch here:
> https://github.com/okfn/owslib
FYI, all the latest work done on harvesting already targets the latest
version of OWSLib (ie not the okfn one)



On 7 November 2012 13:13, Tom Kralidis <tomkralidis at hotmail.com> wrote:

>
> Testing each of those CSWs with OWSLib directly resulted in no errors when
> testing with 0.5.1 in a virtualenv; however I am making rudimentary
> requests.
>
See previous comments on where OWSLib may be involved.


>
> CSW 2.0.2 (despite the .z bumps) is very different from 2.0.1 and 2.0.0.
> OWSLib's CSW support is for 2.0.2 only.
That's really useful to know.


> I'm not familiar with the CKAN code on top of OWSLib.  Can anyone point to
> the code, or a trace of what the context of the issue is?  I'm willing to
> hunt down the cause of the issue here.
Thanks, your thoughts on [1] would be great.


Hope this helps a bit

Adrià



[1] https://github.com/geopython/OWSLib/pull/40
[2] https://github.com/amercader/OWSLib




More information about the ckan-dev mailing list