[ckan-dev] csw harvesting

Alessio Dragoni alessio.dragoni at gmail.com
Wed Apr 9 16:09:15 UTC 2014


I've experieced some issues with 2.0.2 version response type while I always
got it working with 1.3.0 CSW version type
On Apr 9, 2014 6:05 PM, "Adrià Mercader" <adria.mercader at okfn.org> wrote:

> Hi Hilde,
>
> At the CKAN level it doesn't matter if you add the GetCapabilities
> parameters or not, as it just forwards the querying and parsing to
> OWSLib. I'm not sure if this is an issue at the OWSLib level, although
> I highly doubt that this would matter.
>
>
> Adrià
>
> On 8 April 2014 16:36, Hildegard Gerlach
> <hildegard.gerlach at jrc.ec.europa.eu> wrote:
> > The solution was that I had to append
> >
> > ?request=GetCapabilities&service=CSW&version=2.0.2
> >
> > to the URL of the server to harvest. For some other servers I had
> problems
> > if I appended the GetCapabilities request. Are there any guidelines on
> how
> > specifying the URL ?
> >
> > Thanks
> >
> > Hilde
> >
> >
> > On 4/8/2014 2:30 PM, Philippe Duchesne wrote:
> >
> > Can you share the URL to your CSW, or its capabilities document ?
> >
> > --p.
> >
> >
> > On Tue, Apr 8, 2014 at 2:11 PM, Hildegard Gerlach
> > <hildegard.gerlach at jrc.ec.europa.eu> wrote:
> >>
> >> Dear all,
> >>
> >> I have a problem harvesting from a csw server. I get the following error
> >> message
> >>
> >> 2014-04-08 11:43:27,063 ERROR
> [ckanext.spatial.harvesters.csw.CSW.gather]
> >> Exception: Traceback (most recent call last):
> >>   File
> >>
> "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/harvesters/csw.py",
> >> line 95, in gather_stage
> >>     for identifier in self.csw.getidentifiers(page=10,
> >> outputschema=self.output_schema(), cql=cql):
> >>   File
> >>
> "/usr/local/ckan/pyenv/src/ckanext-spatial/ckanext/spatial/lib/csw_client.py",
> >> line 120, in getidentifiers
> >>     csw.getrecords2(**kwa)
> >>   File
> "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
> >> line 343, in getrecords2
> >>     self._invoke()
> >>   File
> "/usr/local/ckan/pyenv/lib/python2.6/site-packages/owslib/csw.py",
> >> line 611, in _invoke
> >>     raise RuntimeError, 'Document is XML, but not CSW-ish'
> >> RuntimeError: Document is XML, but not CSW-ish
> >> 2014-04-08 11:43:27,081 ERROR [ckanext.harvest.harvesters.base] Error
> >> gathering the identifiers from the CSW server [Document is XML, but not
> >> CSW-ish]
> >> 2014-04-08 11:43:27,095 ERROR [ckanext.harvest.queue] Gather stage
> failed
> >>
> >>
> >> I think the problem is in the GetCapabilities of the csw server which
> has
> >> <ows:Operation name="Harvest">
> >>
> >> while the other csw servers have this part commented.
> >> <!--
> >>         <ows:Operation name="Harvest">
> >>             <ows:DCP>
> >>                 <ows:HTTP>
> >>                     <ows:Get
> >> xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw" />
> >>                     <ows:Post
> >> xlink:href="http://$HOST:$PORT$SERVLET/srv/en/csw"  />
> >>                 </ows:HTTP>
> >>             </ows:DCP>
> >>         </ows:Operation>
> >> -->
> >>
> >> Looking into owslib/csw.py I can see the following:
> >>
> >>        # parse result see if it's XML
> >>         self._exml = etree.parse(StringIO.StringIO(self.response))
> >>
> >>         # it's XML.  Attempt to decipher whether the XML response is
> >> CSW-ish """
> >>         valid_xpaths = [
> >>             util.nspath_eval('ows:ExceptionReport', namespaces),
> >>             util.nspath_eval('csw:Capabilities', namespaces),
> >>             util.nspath_eval('csw:DescribeRecordResponse', namespaces),
> >>             util.nspath_eval('csw:GetDomainResponse', namespaces),
> >>             util.nspath_eval('csw:GetRecordsResponse', namespaces),
> >>             util.nspath_eval('csw:GetRecordByIdResponse', namespaces),
> >>             util.nspath_eval('csw:HarvestResponse', namespaces),
> >>             util.nspath_eval('csw:TransactionResponse', namespaces)
> >>         ]
> >>
> >>         if self._exml.getroot().tag not in valid_xpaths:
> >>             raise RuntimeError, 'Document is XML, but not CSW-ish'
> >>
> >>
> >> but harvest is included, so I don't understand why it creates problems.
> >>
> >> I am using OWSLIB 0.8.6
> >>
> >> Any help appreciated.
> >>
> >> Thanks
> >>
> >> Hilde
> >> _______________________________________________
> >> ckan-dev mailing list
> >> ckan-dev at lists.okfn.org
> >> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >
> >
> >
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >
> >
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20140409/76d4ebc1/attachment-0003.html>


More information about the ckan-dev mailing list