[ckan-dev] csw harvesting

Philippe Duchesne pduchesne at gmail.com
Tue Apr 8 12:30:59 UTC 2014


Can you share the URL to your CSW, or its capabilities document ?

--p.


On Tue, Apr 8, 2014 at 2:11 PM, Hildegard Gerlach <
hildegard.gerlach at jrc.ec.europa.eu> wrote:

> Dear all,
>
> I have a problem harvesting from a csw server. I get the following error
> message
>
> 2014-04-08 11:43:27,063 ERROR [ckanext.spatial.harvesters.csw.CSW.gather]
> Exception: Traceback (most recent call last):
>   File "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py",
> line 95, in gather_stage
>     for identifier in self.csw.getidentifiers(page=10,
> outputschema=self.output_schema(), cql=cql):
>   File "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py",
> line 120, in getidentifiers
>     csw.getrecords2(**kwa)
>   File "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
> line 343, in getrecords2
>     self._invoke()
>   File "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
> line 611, in _invoke
>     raise RuntimeError, 'Document is XML, but not CSW-ish'
> RuntimeError: Document is XML, but not CSW-ish
> 2014-04-08 11:43:27,081 ERROR [ckanext.harvest.harvesters.base] Error
> gathering the identifiers from the CSW server [Document is XML, but not
> CSW-ish]
> 2014-04-08 11:43:27,095 ERROR [ckanext.harvest.queue] Gather stage failed
>
>
> I think the problem is in the GetCapabilities of the csw server which has
> <ows:Operation name="Harvest">
>
> while the other csw servers have this part commented.
> <!--
>         <ows:Operation name="Harvest">
>             <ows:DCP>
>                 <ows:HTTP>
>                     <ows:Get xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw"
> />
>                     <ows:Post xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw"
>  />
>                 </ows:HTTP>
>             </ows:DCP>
>         </ows:Operation>
> -->
>
> Looking into owslib/csw.py I can see the following:
>
>        # parse result see if it's XML
>         self._exml = etree.parse(StringIO.StringIO(self.response))
>
>         # it's XML.  Attempt to decipher whether the XML response is
> CSW-ish """
>         valid_xpaths = [
>             util.nspath_eval('ows:ExceptionReport', namespaces),
>             util.nspath_eval('csw:Capabilities', namespaces),
>             util.nspath_eval('csw:DescribeRecordResponse', namespaces),
>             util.nspath_eval('csw:GetDomainResponse', namespaces),
>             util.nspath_eval('csw:GetRecordsResponse', namespaces),
>             util.nspath_eval('csw:GetRecordByIdResponse', namespaces),
>             util.nspath_eval('csw:HarvestResponse', namespaces),
>             util.nspath_eval('csw:TransactionResponse', namespaces)
>         ]
>
>         if self._exml.getroot().tag not in valid_xpaths:
>             raise RuntimeError, 'Document is XML, but not CSW-ish'
>
>
> but harvest is included, so I don't understand why it creates problems.
>
> I am using OWSLIB 0.8.6
>
> Any help appreciated.
>
> Thanks
>
> Hilde
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140408/604a2735/attachment-0003.html>


More information about the ckan-dev mailing list