[ckan-dev] csw harvesting
Hildegard Gerlach
hildegard.gerlach at jrc.ec.europa.eu
Tue Apr 8 15:36:14 UTC 2014
The solution was that I had to append
?request=GetCapabilities&service=CSW&version=2.0.2
to the URL of the server to harvest. For some other servers I had
problems if I appended the GetCapabilities request. Are there any
guidelines on how specifying the URL ?
Thanks
Hilde
On 4/8/2014 2:30 PM, Philippe Duchesne wrote:
> Can you share the URL to your CSW, or its capabilities document ?
>
> --p.
>
>
> On Tue, Apr 8, 2014 at 2:11 PM, Hildegard Gerlach
> <hildegard.gerlach at jrc.ec.europa.eu
> <mailto:hildegard.gerlach at jrc.ec.europa.eu>> wrote:
>
> Dear all,
>
> I have a problem harvesting from a csw server. I get the following
> error message
>
> 2014-04-08 11:43:27,063 ERROR
> [ckanext.spatial.harvesters.csw.CSW.gather] Exception: Traceback
> (most recent call last):
> File
> "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py",
> line 95, in gather_stage
> for identifier in self.csw.getidentifiers(page=10,
> outputschema=self.output_schema(), cql=cql):
> File
> "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py",
> line 120, in getidentifiers
> csw.getrecords2(**kwa)
> File
> "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
> line 343, in getrecords2
> self._invoke()
> File
> "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
> line 611, in _invoke
> raise RuntimeError, 'Document is XML, but not CSW-ish'
> RuntimeError: Document is XML, but not CSW-ish
> 2014-04-08 11:43:27,081 ERROR [ckanext.harvest.harvesters.base]
> Error gathering the identifiers from the CSW server [Document is
> XML, but not CSW-ish]
> 2014-04-08 11:43:27,095 ERROR [ckanext.harvest.queue] Gather stage
> failed
>
>
> I think the problem is in the GetCapabilities of the csw server
> which has
> <ows:Operation name="Harvest">
>
> while the other csw servers have this part commented.
> <!--
> <ows:Operation name="Harvest">
> <ows:DCP>
> <ows:HTTP>
> <ows:Get
> xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
> <ows:Post
> xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
> </ows:HTTP>
> </ows:DCP>
> </ows:Operation>
> -->
>
> Looking into owslib/csw.py I can see the following:
>
> # parse result see if it's XML
> self._exml = etree.parse(StringIO.StringIO(self.response))
>
> # it's XML. Attempt to decipher whether the XML response
> is CSW-ish """
> valid_xpaths = [
> util.nspath_eval('ows:ExceptionReport', namespaces),
> util.nspath_eval('csw:Capabilities', namespaces),
> util.nspath_eval('csw:DescribeRecordResponse',
> namespaces),
> util.nspath_eval('csw:GetDomainResponse', namespaces),
> util.nspath_eval('csw:GetRecordsResponse', namespaces),
> util.nspath_eval('csw:GetRecordByIdResponse', namespaces),
> util.nspath_eval('csw:HarvestResponse', namespaces),
> util.nspath_eval('csw:TransactionResponse', namespaces)
> ]
>
> if self._exml.getroot().tag not in valid_xpaths:
> raise RuntimeError, 'Document is XML, but not CSW-ish'
>
>
> but harvest is included, so I don't understand why it creates
> problems.
>
> I am using OWSLIB 0.8.6
>
> Any help appreciated.
>
> Thanks
>
> Hilde
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org <mailto:ckan-dev at lists.okfn.org>
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140408/8c7987d8/attachment-0003.html>
More information about the ckan-dev
mailing list