[ckan-dev] csw harvesting
Hildegard Gerlach
hildegard.gerlach at jrc.ec.europa.eu
Tue Apr 8 12:11:11 UTC 2014
Dear all,
I have a problem harvesting from a csw server. I get the following error
message
2014-04-08 11:43:27,063 ERROR
[ckanext.spatial.harvesters.csw.CSW.gather] Exception: Traceback (most
recent call last):
File
"/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py",
line 95, in gather_stage
for identifier in self.csw.getidentifiers(page=10,
outputschema=self.output_schema(), cql=cql):
File
"/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py",
line 120, in getidentifiers
csw.getrecords2(**kwa)
File
"/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py", line
343, in getrecords2
self._invoke()
File
"/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py", line
611, in _invoke
raise RuntimeError, 'Document is XML, but not CSW-ish'
RuntimeError: Document is XML, but not CSW-ish
2014-04-08 11:43:27,081 ERROR [ckanext.harvest.harvesters.base] Error
gathering the identifiers from the CSW server [Document is XML, but not
CSW-ish]
2014-04-08 11:43:27,095 ERROR [ckanext.harvest.queue] Gather stage failed
I think the problem is in the GetCapabilities of the csw server which has
<ows:Operation name="Harvest">
while the other csw servers have this part commented.
<!--
<ows:Operation name="Harvest">
<ows:DCP>
<ows:HTTP>
<ows:Get
xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
<ows:Post
xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
</ows:HTTP>
</ows:DCP>
</ows:Operation>
-->
Looking into owslib/csw.py I can see the following:
# parse result see if it's XML
self._exml = etree.parse(StringIO.StringIO(self.response))
# it's XML. Attempt to decipher whether the XML response is
CSW-ish """
valid_xpaths = [
util.nspath_eval('ows:ExceptionReport', namespaces),
util.nspath_eval('csw:Capabilities', namespaces),
util.nspath_eval('csw:DescribeRecordResponse', namespaces),
util.nspath_eval('csw:GetDomainResponse', namespaces),
util.nspath_eval('csw:GetRecordsResponse', namespaces),
util.nspath_eval('csw:GetRecordByIdResponse', namespaces),
util.nspath_eval('csw:HarvestResponse', namespaces),
util.nspath_eval('csw:TransactionResponse', namespaces)
]
if self._exml.getroot().tag not in valid_xpaths:
raise RuntimeError, 'Document is XML, but not CSW-ish'
but harvest is included, so I don't understand why it creates problems.
I am using OWSLIB 0.8.6
Any help appreciated.
Thanks
Hilde
More information about the ckan-dev
mailing list