[ckan-dev] csw harvesting

Hildegard Gerlach hildegard.gerlach at jrc.ec.europa.eu
Tue Apr 8 12:11:11 UTC 2014


Dear all,

I have a problem harvesting from a csw server. I get the following error 
message

2014-04-08 11:43:27,063 ERROR 
[ckanext.spatial.harvesters.csw.CSW.gather] Exception: Traceback (most 
recent call last):
   File 
"/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py", 
line 95, in gather_stage
     for identifier in self.csw.getidentifiers(page=10, 
outputschema=self.output_schema(), cql=cql):
   File 
"/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py", 
line 120, in getidentifiers
     csw.getrecords2(**kwa)
   File 
"/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py", line 
343, in getrecords2
     self._invoke()
   File 
"/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py", line 
611, in _invoke
     raise RuntimeError, 'Document is XML, but not CSW-ish'
RuntimeError: Document is XML, but not CSW-ish
2014-04-08 11:43:27,081 ERROR [ckanext.harvest.harvesters.base] Error 
gathering the identifiers from the CSW server [Document is XML, but not 
CSW-ish]
2014-04-08 11:43:27,095 ERROR [ckanext.harvest.queue] Gather stage failed


I think the problem is in the GetCapabilities of the csw server which has
<ows:Operation name="Harvest">

while the other csw servers have this part commented.
<!--
         <ows:Operation name="Harvest">
             <ows:DCP>
                 <ows:HTTP>
                     <ows:Get 
xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
                     <ows:Post 
xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw"  />
                 </ows:HTTP>
             </ows:DCP>
         </ows:Operation>
-->

Looking into owslib/csw.py I can see the following:

        # parse result see if it's XML
         self._exml = etree.parse(StringIO.StringIO(self.response))

         # it's XML.  Attempt to decipher whether the XML response is 
CSW-ish """
         valid_xpaths = [
             util.nspath_eval('ows:ExceptionReport', namespaces),
             util.nspath_eval('csw:Capabilities', namespaces),
             util.nspath_eval('csw:DescribeRecordResponse', namespaces),
             util.nspath_eval('csw:GetDomainResponse', namespaces),
             util.nspath_eval('csw:GetRecordsResponse', namespaces),
             util.nspath_eval('csw:GetRecordByIdResponse', namespaces),
             util.nspath_eval('csw:HarvestResponse', namespaces),
             util.nspath_eval('csw:TransactionResponse', namespaces)
         ]

         if self._exml.getroot().tag not in valid_xpaths:
             raise RuntimeError, 'Document is XML, but not CSW-ish'


but harvest is included, so I don't understand why it creates problems.

I am using OWSLIB 0.8.6

Any help appreciated.

Thanks

Hilde



More information about the ckan-dev mailing list