[ckan-dev] CSW Harvesting from GeoPortal and GeoNetwork

Adrià Mercader adria.mercader at okfn.org
Tue Nov 13 11:32:40 UTC 2012


Hi Ryan,

On 13 November 2012 00:09, Ryan Hodges <rhodges at ecotrust.org> wrote:
> I will continue to work on testing this. What we're looking for is building a catalog that can harvest metadata (iso19139, maybe even fgdc if available) from GeoNetwork, GeoPortal, and (hopefully) other CSW v2.0.2 enabled servers. I know you are working hard on implementing some (all?) of that, but I suppose I have to ask before committing to a solution: is harvesting from these sources something CKAN is committed to doing, or only something it would like to do and is trying out?
The current work will allow harvesting ISO 19139 documents from
servers implementing CSW 2.0.2 (this is already possible, but as you
already found needs some work). We also support WAF (Web Accessible
Folders), and on this case FGDC documents will be also supported (they
will be first transformed to ISO). Potentially we may offer support
for FGDC documents over CSW, but this is still not clear.
This work is funded and under way, other than that we can not assure
immediate support for other features (although we'd love to support
more stuff).

Adrià






> Thanks again so much for all of the help and consideration!
> Ryan Hodges | Applications Developer | Ecotrust
> 721 NW 9th Avenue, Suite 200 . Portland, OR 97209
> T (503) 467.0800 | F (503) 222.1517 | www.ecotrust.org
>
>
> -----Original Message-----
> From: ckan-dev-bounces at lists.okfn.org [mailto:ckan-dev-bounces at lists.okfn.org] On Behalf Of Adrià Mercader
> Sent: Wednesday, November 07, 2012 7:47 AM
> To: CKAN Development Discussions
> Subject: Re: [ckan-dev] CSW Harvesting from GeoPortal and GeoNetwork
>
> Hi,
>
> Thanks all for all the feedback and pointers.
>
> Before answering specific issues, let me stress again what I mentioned in earlier threads: support for harvesting generic CSW sources is a feature currently under development and bound to need some significant work to get it working as is. It hasn't been thoroughly tested and the harvester code can be complex, with errors returned are sometimes obscure, both these things we are working to improve (literally right now). As more different servers are tested, each with its own issues, more problems will no doubt surface, so it will take some time until this is stable enough for production.
>
> Ryan, David, Tom see comments below:
>
>>> > Am 07.11.2012 2:40, schrieb Ryan Hodges:
>>> > And in my config file:
>>> >
>>> > ckan.spatial.validator.profiles = iso19139
>
>>> > url: http://apps.who.int/geonetwork/srv/csw
>>> >
>>> > status: Gathering errors
>>> >
>>> > - Error contacting the CSW server: '2.0.2'
>>> >
>>> > Harvested: 0
>>> >
>>> > WHY: Server is at version 2.0.1. Is harvesting from this version
>>> > not available?
> CKAN relies on OWSLib for querying the CSW servers, and if as Tom mentioned, it only supports CSW 2.0.2, that's what CKAN will support.
>
>
>>> > url: http://www.fao.org/geonetwork/srv/en/csw
>>> >
>>> > status: Object errors
>>> >
>>> > - GUID 6fed4955-c0f4-49e6-aaf2-9475504dc6bc
>>> >
>>> > - Validating against "ISO19139 XSD Schema" profile failed:
>>> >
>>> > - Dataset schema (gmx.xsd) Validation Error: (u"Element
>>> > '{http://www.isotc211.org/2005/gmd}MD_SatelliteSpatialRepresentation':
>>> > This
>>> > element is not expected. Expected is one of (
>>> > {http://www.isotc211.org/2005/gmd}AbstractMD_SpatialRepresentation,
>>> > {http://www.isotc211.org/2005/gmd}MD_GridSpatialRepresentation,
>>> > {http://www.isotc211.org/2005/gmd}MD_VectorSpatialRepresentation,
>>> > {http://www.isotc211.org/2005/gmd}MD_Georeferenceable,
>>> > {http://www.isotc211.org/2005/gmd}MD_Georectified )., line 74",)
>>> >
>>> > Harvested: 10
>>> >
>>> > Why: As the error says, it doesn't recognize
>>> > MD_SatelliteSpatialRepresentation. Another error included a bad
>>> > date-time.
> This is just a validation error for this document, which does not adhere to the ISO 19193 XSD validation. The current harvesting implementation will not prevent the package from being created, but you can try other ISO validation profiles to see if the errors go away. There are profiles for the NGDC (iso19139ngdc) and EDEN
> (iso19139eden) schemas available.
>
>
>>> > NOTE: This didn't work at first: the directory for ISO19139
>>> > validation did not exist in the python 2.6 egg:
>>> >
>>> >
>>> > 'site-packages/ckanext_spatial-0.2-py2.6.egg/ckanext/spatial/validation/xml'
>>> > <- Does not exist
>>> >
>>> > I had to copy it in manually.
> How did you install ckanext-spatial? Was it from sources?
>
>
>>> > url: http://gptogc.esri.com/geoportal/csw
>>> >
>>> > status: Gathering errors
>>> >
>>> > - Error gathering the identifiers from the CSW server ['NoneType'
>>> > object has no attribute 'find']
> I'm not sure if this is exactly the same issue, but I found a similar one that needed fixing on OWSLib [1].
> Feel free to try my fork with the changes to see if that's solves the issue
>
>
>
>
>>> > url: http://geo.data.gov/geoportal/csw
>>> >
>>> > status: Gathering errors
>>> >
>>> > - Error gathering the identifiers from the CSW server ['\nAn
>>> > exception occurred with no applicable code\n']
> For some reason, the remote server returns an exception while gathering the indentifiers. We'd need to investigate further on this.
>
>
> On 7 November 2012 10:24, David Read <david.read at hackneyworkshop.com> wrote:
>> Ryan,
>>
>> I see from the code that "Error gathering the identifiers from the CSW
>> server" is a problem with calling OWSLib's "getidentifiers" method,
>> which is the first time we use this library to call the CSW server.
> Just to be clear, getindentifiers is a CKAN method (csw_client.py) which in turns calls OWSLib's csw.getrecords.
>
>
>> The version of OWSLib that ckanext-spatial tries to use is not the
>> original SVN repo, but an OKF branch here:
>> https://github.com/okfn/owslib
> FYI, all the latest work done on harvesting already targets the latest version of OWSLib (ie not the okfn one)
>
>
>
> On 7 November 2012 13:13, Tom Kralidis <tomkralidis at hotmail.com> wrote:
>
>>
>> Testing each of those CSWs with OWSLib directly resulted in no errors
>> when testing with 0.5.1 in a virtualenv; however I am making
>> rudimentary requests.
>>
> See previous comments on where OWSLib may be involved.
>
>
>>
>> CSW 2.0.2 (despite the .z bumps) is very different from 2.0.1 and 2.0.0.
>> OWSLib's CSW support is for 2.0.2 only.
> That's really useful to know.
>
>
>> I'm not familiar with the CKAN code on top of OWSLib.  Can anyone
>> point to the code, or a trace of what the context of the issue is?
>> I'm willing to hunt down the cause of the issue here.
> Thanks, your thoughts on [1] would be great.
>
>
> Hope this helps a bit
>
> Adrià
>
>
>
> [1] https://github.com/geopython/OWSLib/pull/40
> [2] https://github.com/amercader/OWSLib
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev




More information about the ckan-dev mailing list