[ckan-dev] CSW Harvesting from GeoPortal and GeoNetwork

Ryan Hodges rhodges at ecotrust.org
Tue Nov 13 00:09:56 UTC 2012


Thanks Adrià, Tom, David, and Konrad! I am trying out your suggestions, but wanted to answer your questions in the meantime:

"How did you install ckanext-spatial? Was it from sources?" -
Yes, I installed CKAN [1.8], ckanext-harvest [master], and ckanext-spatial [harvest-generic-iso] from source. However I did have to deviate from the instructions - the modules weren't recognized until I ran `setup.py install` on them (`setup.py develop` didn't seem to do the trick, may be a Python 2.6 issue?).

"Feel free to try my fork with the changes to see if that's solves the issue" - 
Will do.

I will continue to work on testing this. What we're looking for is building a catalog that can harvest metadata (iso19139, maybe even fgdc if available) from GeoNetwork, GeoPortal, and (hopefully) other CSW v2.0.2 enabled servers. I know you are working hard on implementing some (all?) of that, but I suppose I have to ask before committing to a solution: is harvesting from these sources something CKAN is committed to doing, or only something it would like to do and is trying out?

Thanks again so much for all of the help and consideration!
Ryan Hodges | Applications Developer | Ecotrust
721 NW 9th Avenue, Suite 200 . Portland, OR 97209
T (503) 467.0800 | F (503) 222.1517 | www.ecotrust.org


-----Original Message-----
From: ckan-dev-bounces at lists.okfn.org [mailto:ckan-dev-bounces at lists.okfn.org] On Behalf Of Adrià Mercader
Sent: Wednesday, November 07, 2012 7:47 AM
To: CKAN Development Discussions
Subject: Re: [ckan-dev] CSW Harvesting from GeoPortal and GeoNetwork

Hi,

Thanks all for all the feedback and pointers.

Before answering specific issues, let me stress again what I mentioned in earlier threads: support for harvesting generic CSW sources is a feature currently under development and bound to need some significant work to get it working as is. It hasn't been thoroughly tested and the harvester code can be complex, with errors returned are sometimes obscure, both these things we are working to improve (literally right now). As more different servers are tested, each with its own issues, more problems will no doubt surface, so it will take some time until this is stable enough for production.

Ryan, David, Tom see comments below:

>> > Am 07.11.2012 2:40, schrieb Ryan Hodges:
>> > And in my config file:
>> >
>> > ckan.spatial.validator.profiles = iso19139

>> > url: http://apps.who.int/geonetwork/srv/csw
>> >
>> > status: Gathering errors
>> >
>> > - Error contacting the CSW server: '2.0.2'
>> >
>> > Harvested: 0
>> >
>> > WHY: Server is at version 2.0.1. Is harvesting from this version 
>> > not available?
CKAN relies on OWSLib for querying the CSW servers, and if as Tom mentioned, it only supports CSW 2.0.2, that's what CKAN will support.


>> > url: http://www.fao.org/geonetwork/srv/en/csw
>> >
>> > status: Object errors
>> >
>> > - GUID 6fed4955-c0f4-49e6-aaf2-9475504dc6bc
>> >
>> > - Validating against "ISO19139 XSD Schema" profile failed:
>> >
>> > - Dataset schema (gmx.xsd) Validation Error: (u"Element
>> > '{http://www.isotc211.org/2005/gmd}MD_SatelliteSpatialRepresentation':
>> > This
>> > element is not expected. Expected is one of ( 
>> > {http://www.isotc211.org/2005/gmd}AbstractMD_SpatialRepresentation,
>> > {http://www.isotc211.org/2005/gmd}MD_GridSpatialRepresentation,
>> > {http://www.isotc211.org/2005/gmd}MD_VectorSpatialRepresentation,
>> > {http://www.isotc211.org/2005/gmd}MD_Georeferenceable,
>> > {http://www.isotc211.org/2005/gmd}MD_Georectified )., line 74",)
>> >
>> > Harvested: 10
>> >
>> > Why: As the error says, it doesn't recognize 
>> > MD_SatelliteSpatialRepresentation. Another error included a bad 
>> > date-time.
This is just a validation error for this document, which does not adhere to the ISO 19193 XSD validation. The current harvesting implementation will not prevent the package from being created, but you can try other ISO validation profiles to see if the errors go away. There are profiles for the NGDC (iso19139ngdc) and EDEN
(iso19139eden) schemas available.


>> > NOTE: This didn't work at first: the directory for ISO19139 
>> > validation did not exist in the python 2.6 egg:
>> >
>> >
>> > 'site-packages/ckanext_spatial-0.2-py2.6.egg/ckanext/spatial/validation/xml'
>> > <- Does not exist
>> >
>> > I had to copy it in manually.
How did you install ckanext-spatial? Was it from sources?


>> > url: http://gptogc.esri.com/geoportal/csw
>> >
>> > status: Gathering errors
>> >
>> > - Error gathering the identifiers from the CSW server ['NoneType'
>> > object has no attribute 'find']
I'm not sure if this is exactly the same issue, but I found a similar one that needed fixing on OWSLib [1].
Feel free to try my fork with the changes to see if that's solves the issue




>> > url: http://geo.data.gov/geoportal/csw
>> >
>> > status: Gathering errors
>> >
>> > - Error gathering the identifiers from the CSW server ['\nAn 
>> > exception occurred with no applicable code\n']
For some reason, the remote server returns an exception while gathering the indentifiers. We'd need to investigate further on this.


On 7 November 2012 10:24, David Read <david.read at hackneyworkshop.com> wrote:
> Ryan,
>
> I see from the code that "Error gathering the identifiers from the CSW 
> server" is a problem with calling OWSLib's "getidentifiers" method, 
> which is the first time we use this library to call the CSW server.
Just to be clear, getindentifiers is a CKAN method (csw_client.py) which in turns calls OWSLib's csw.getrecords.


> The version of OWSLib that ckanext-spatial tries to use is not the 
> original SVN repo, but an OKF branch here:
> https://github.com/okfn/owslib
FYI, all the latest work done on harvesting already targets the latest version of OWSLib (ie not the okfn one)



On 7 November 2012 13:13, Tom Kralidis <tomkralidis at hotmail.com> wrote:

>
> Testing each of those CSWs with OWSLib directly resulted in no errors 
> when testing with 0.5.1 in a virtualenv; however I am making 
> rudimentary requests.
>
See previous comments on where OWSLib may be involved.


>
> CSW 2.0.2 (despite the .z bumps) is very different from 2.0.1 and 2.0.0.
> OWSLib's CSW support is for 2.0.2 only.
That's really useful to know.


> I'm not familiar with the CKAN code on top of OWSLib.  Can anyone 
> point to the code, or a trace of what the context of the issue is?  
> I'm willing to hunt down the cause of the issue here.
Thanks, your thoughts on [1] would be great.


Hope this helps a bit

Adrià



[1] https://github.com/geopython/OWSLib/pull/40
[2] https://github.com/amercader/OWSLib

_______________________________________________
ckan-dev mailing list
ckan-dev at lists.okfn.org
http://lists.okfn.org/mailman/listinfo/ckan-dev
Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev




More information about the ckan-dev mailing list