[ckan-dev] csw harvesting

Adrià Mercader adria.mercader at okfn.org
Wed Apr 9 16:05:31 UTC 2014


Hi Hilde,

At the CKAN level it doesn't matter if you add the GetCapabilities
parameters or not, as it just forwards the querying and parsing to
OWSLib. I'm not sure if this is an issue at the OWSLib level, although
I highly doubt that this would matter.


Adrià

On 8 April 2014 16:36, Hildegard Gerlach
<hildegard.gerlach at jrc.ec.europa.eu> wrote:
> The solution was that I had to append
>
> ?request=GetCapabilities&service=CSW&version=2.0.2
>
> to the URL of the server to harvest. For some other servers I had problems
> if I appended the GetCapabilities request. Are there any guidelines on how
> specifying the URL ?
>
> Thanks
>
> Hilde
>
>
> On 4/8/2014 2:30 PM, Philippe Duchesne wrote:
>
> Can you share the URL to your CSW, or its capabilities document ?
>
> --p.
>
>
> On Tue, Apr 8, 2014 at 2:11 PM, Hildegard Gerlach
> <hildegard.gerlach at jrc.ec.europa.eu> wrote:
>>
>> Dear all,
>>
>> I have a problem harvesting from a csw server. I get the following error
>> message
>>
>> 2014-04-08 11:43:27,063 ERROR [ckanext.spatial.harvesters.csw.CSW.gather]
>> Exception: Traceback (most recent call last):
>>   File
>> "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py",
>> line 95, in gather_stage
>>     for identifier in self.csw.getidentifiers(page=10,
>> outputschema=self.output_schema(), cql=cql):
>>   File
>> "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py",
>> line 120, in getidentifiers
>>     csw.getrecords2(**kwa)
>>   File "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
>> line 343, in getrecords2
>>     self._invoke()
>>   File "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
>> line 611, in _invoke
>>     raise RuntimeError, 'Document is XML, but not CSW-ish'
>> RuntimeError: Document is XML, but not CSW-ish
>> 2014-04-08 11:43:27,081 ERROR [ckanext.harvest.harvesters.base] Error
>> gathering the identifiers from the CSW server [Document is XML, but not
>> CSW-ish]
>> 2014-04-08 11:43:27,095 ERROR [ckanext.harvest.queue] Gather stage failed
>>
>>
>> I think the problem is in the GetCapabilities of the csw server which has
>> <ows:Operation name="Harvest">
>>
>> while the other csw servers have this part commented.
>> <!--
>>         <ows:Operation name="Harvest">
>>             <ows:DCP>
>>                 <ows:HTTP>
>>                     <ows:Get
>> xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
>>                     <ows:Post
>> xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw"  />
>>                 </ows:HTTP>
>>             </ows:DCP>
>>         </ows:Operation>
>> -->
>>
>> Looking into owslib/csw.py I can see the following:
>>
>>        # parse result see if it's XML
>>         self._exml = etree.parse(StringIO.StringIO(self.response))
>>
>>         # it's XML.  Attempt to decipher whether the XML response is
>> CSW-ish """
>>         valid_xpaths = [
>>             util.nspath_eval('ows:ExceptionReport', namespaces),
>>             util.nspath_eval('csw:Capabilities', namespaces),
>>             util.nspath_eval('csw:DescribeRecordResponse', namespaces),
>>             util.nspath_eval('csw:GetDomainResponse', namespaces),
>>             util.nspath_eval('csw:GetRecordsResponse', namespaces),
>>             util.nspath_eval('csw:GetRecordByIdResponse', namespaces),
>>             util.nspath_eval('csw:HarvestResponse', namespaces),
>>             util.nspath_eval('csw:TransactionResponse', namespaces)
>>         ]
>>
>>         if self._exml.getroot().tag not in valid_xpaths:
>>             raise RuntimeError, 'Document is XML, but not CSW-ish'
>>
>>
>> but harvest is included, so I don't understand why it creates problems.
>>
>> I am using OWSLIB 0.8.6
>>
>> Any help appreciated.
>>
>> Thanks
>>
>> Hilde
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>



More information about the ckan-dev mailing list