[ckan-dev] csw harvesting

Hildegard Gerlach hildegard.gerlach at jrc.ec.europa.eu
Tue Apr 8 15:36:14 UTC 2014


The solution was that I had to append

?request=GetCapabilities&service=CSW&version=2.0.2

to the URL of the server to harvest. For some other servers I had 
problems if I appended the GetCapabilities request. Are there any 
guidelines on how specifying the URL ?

Thanks

Hilde

On 4/8/2014 2:30 PM, Philippe Duchesne wrote:
> Can you share the URL to your CSW, or its capabilities document ?
>
> --p.
>
>
> On Tue, Apr 8, 2014 at 2:11 PM, Hildegard Gerlach 
> <hildegard.gerlach at jrc.ec.europa.eu 
> <mailto:hildegard.gerlach at jrc.ec.europa.eu>> wrote:
>
>     Dear all,
>
>     I have a problem harvesting from a csw server. I get the following
>     error message
>
>     2014-04-08 11:43:27,063 ERROR
>     [ckanext.spatial.harvesters.csw.CSW.gather] Exception: Traceback
>     (most recent call last):
>       File
>     "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py",
>     line 95, in gather_stage
>         for identifier in self.csw.getidentifiers(page=10,
>     outputschema=self.output_schema(), cql=cql):
>       File
>     "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py",
>     line 120, in getidentifiers
>         csw.getrecords2(**kwa)
>       File
>     "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
>     line 343, in getrecords2
>         self._invoke()
>       File
>     "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
>     line 611, in _invoke
>         raise RuntimeError, 'Document is XML, but not CSW-ish'
>     RuntimeError: Document is XML, but not CSW-ish
>     2014-04-08 11:43:27,081 ERROR [ckanext.harvest.harvesters.base]
>     Error gathering the identifiers from the CSW server [Document is
>     XML, but not CSW-ish]
>     2014-04-08 11:43:27,095 ERROR [ckanext.harvest.queue] Gather stage
>     failed
>
>
>     I think the problem is in the GetCapabilities of the csw server
>     which has
>     <ows:Operation name="Harvest">
>
>     while the other csw servers have this part commented.
>     <!--
>             <ows:Operation name="Harvest">
>                 <ows:DCP>
>                     <ows:HTTP>
>                         <ows:Get
>     xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
>                         <ows:Post
>     xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw"  />
>                     </ows:HTTP>
>                 </ows:DCP>
>             </ows:Operation>
>     -->
>
>     Looking into owslib/csw.py I can see the following:
>
>            # parse result see if it's XML
>             self._exml = etree.parse(StringIO.StringIO(self.response))
>
>             # it's XML.  Attempt to decipher whether the XML response
>     is CSW-ish """
>             valid_xpaths = [
>                 util.nspath_eval('ows:ExceptionReport', namespaces),
>                 util.nspath_eval('csw:Capabilities', namespaces),
>                 util.nspath_eval('csw:DescribeRecordResponse',
>     namespaces),
>                 util.nspath_eval('csw:GetDomainResponse', namespaces),
>                 util.nspath_eval('csw:GetRecordsResponse', namespaces),
>                 util.nspath_eval('csw:GetRecordByIdResponse', namespaces),
>                 util.nspath_eval('csw:HarvestResponse', namespaces),
>                 util.nspath_eval('csw:TransactionResponse', namespaces)
>             ]
>
>             if self._exml.getroot().tag not in valid_xpaths:
>                 raise RuntimeError, 'Document is XML, but not CSW-ish'
>
>
>     but harvest is included, so I don't understand why it creates
>     problems.
>
>     I am using OWSLIB 0.8.6
>
>     Any help appreciated.
>
>     Thanks
>
>     Hilde
>     _______________________________________________
>     ckan-dev mailing list
>     ckan-dev at lists.okfn.org <mailto:ckan-dev at lists.okfn.org>
>     https://lists.okfn.org/mailman/listinfo/ckan-dev
>     Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140408/8c7987d8/attachment-0003.html>


More information about the ckan-dev mailing list