[ckan-dev] CSW harvesting update

James Gardner james at 3aims.com
Sun Feb 6 16:30:28 UTC 2011


Hi Will,

On 06/02/11 13:43, William Waites wrote:
> Just an update re: feature 885, using owslib to talk to
> a CSW server instead of our own custom client. I've done
> a partial merge in the feature-885-owslib branch. This
> removes the ckan/lib/cswclient.py and paraphenalia and
> changes the harvesting code to use the owslib implementation.

I've just looked in that branch and can't see any import of owslib? 
Also, the controller/harvesting.py file has gone from that branch. 
Sorry, I don't understand? What exactly did you merge? Where is your new 
code? I was just expecting you to take default, and replace the few 
lines that make the CSW calls?

> This is "partial" because though it uses the owslib client
> to talk to the server, it still uses John's code to parse
> the result and turn it into a CKAN package. This parsing
> and transforming code should probably be replaced but
> needs to be done carefully because the UKLII requirements
> are stricter than generic ISO19139. This means that we
> cannot use this to harvest from just any CSW service,
> the Dutch national registry, for example, serves things
> that we would consider invalid.

That's absolutely fine though. I wouldn't want the parsing code changed 
because a huge amount of effort has gone into deciding how it should 
work and after Seb's refactor it is fairly nice to work with, any 
changes to that would hinder rather than help at the moment.

> Ideally we could handle
> this gracefully and just have some stricter checking
> for UK purposes which will come from the schematron
> validation step, but the way things are laid out this
> is difficult to do directly. So as the validation hasn't
> been implemented yet, we parse the document twice as an
> interim measure.

Sure, that would be nice, but this has to be live next Monday, all 
working ready for users to use so supporting other CSWs at this stage is 
out of scope, we just need it deployed, working and tested for the cases 
we have agreed to deliver.

> The immediate benefit of using the owslib client is that
> it lets us page through results, which are not all returned
> in one request. This is, of course, critical as without
> it we would get some fixed server-specific number of
> records and miss the rest. This now works, but also
> can be improved -- at the moment there is a request to
> get the brief record descriptions for their identifiers
> and then makes a separate request for the detail of each
> records. This would be better implemented to just get
> some larger number of details in sequence, it would be
> nicer of us to make fewer requests to the services that
> we are aggregating.

As long as it works I don't care at this point ;)

> James, I looked in some detail at your patch to owslib
> and didn't seem to need it -- apart from a non-critical
> passage that changes owslib's idea of which etree
> implementation to use, they make a different choice from
> us, preferring, in order, external elementtree, internal
> elementtree and lxml's one whereas we use lxml
> unconditionally. I should note that while their APIs are
> similar they are not compatible, some things like the
> pretty_print argument to etree.tostring() are only
> supported by lxml. I attach this small patch to this
> message, and am trying to track down Sean Gillies to see
> what his opinion of it is. If you could test this branch
> inyour environment to make sure it does what you expect
> today before your meetings tomorrow it would be appreciated.

OK, I will. Again, lxml vs elementtree is the least of my worries at the 
moment. More important is that I couldn't see what were your changes and 
what was the merge, and the harvesting code seems to have been taken out 
of the branch so which branch were you expecting me to deploy?

> Next on my plate here are the schematron validation and
> a CSW server. The latter I think would be best implemented
> as a CKAN client, essentially a standalone proxy for the
> API.

That sounds perfect, thanks a lot. If you could just clarify what you 
changed I'll get your code merged to default and deployed.

Thanks Will,

James






More information about the ckan-dev mailing list