[ckan-dev] harvesting and ckan geo extensions

Wed Apr 6 08:08:20 UTC 2011

Hi

2011/4/5 William Waites <ww at styx.org>:
> So far so good, I added some tweaks to the documentation for how to
> configure the plugins, and reminders to actually install the package
> and put the requirements in the setup.py so that they get installed
> comme il faut.
>
> Some small observations more in the way of documentation and letting
> the others on the list know what is being done. Not criticisms, I
> think you've done a very good job.
>
> In the instructions for the API URL, I may be wrong but I believe that
> the convention for ckanclient is to put http://exmaple.org/api not
> just http://example.org/ if I am not wrong about this it should
> probably be changed for consistency's sake.
>
> I notice the harvester is adding resources with relative URLs, this
> seems to be a pre-existing bug not least because it prevents the
> package from being edited because those fields fail validation.
>
> The authentication arrangement in view.py should probably be slackened
> a bit, since I don't think we need admin privileges to be able to just
> look at a map. Also if there is a viewable resource, probably we
> should have a smaller map without controls on the main package page,
> though I understand why you didn't do this straight away as it is a
> more invasive template change.
>
> On the treatment of the SRS in the extras field. I don't really know
> why we are putting a big blob of XML in there instead of just using
> the well known string identifier in. I think this might be tripping up
> the indexing of some datasets, particularly as the UK often uses its
> own national grid system very often. There are no particular test for
> this, I'll write some once we get some consensus about if we are going
> to actually put the SRID in the SRID field or keep the XML blob.
>
> On the treatment of the bounding box, I mention this here because I
> know that Friedrich and I had discussed this a while back. Probably
> having a separate extra for each of the coordinates of the corners, or
> 4 extras in all is not as good as having just one BBOX extra. Better
> still might be to have an "envelope" extra with WKT in it. This isn't
> as succinct as just four numbers in a bbox but it means that if we
> have better shape information for the perimiter of the coverage of a
> dataset (which hopefully we will have in the future at least for some
> of them) then we can actually use that in the database. It would mean
> that the geometry column into a geometry collection and do some
> minimal processing on it instead of using a polygon.
>
> Back to the cosmetic front, it probably would be a good idea to put in
> a base layer of vmap0 or something to aid in orientation.
>
> I guess the geo search and handling of envelope/bbox extras should
> really be in a ckanext-geo and not in harvesting since it has nothign
> to do with harvesting really, nor with CSW or DGU. That way anything
> with that extra would get indexed and displayed.
>
> We could promote the geo seaerch alongside the regular search. Really
> joining them up properly would be quite difficult because the one uses
> Solr and the other uses PostGIS, but it would be nice to make the
> possibility of browsing the data by map more obvious.
>
> Now, how do we treat the models. This extension creates a database
> table and that has to be done by hand. Am I correct that there is
> still no plugin hook for registering a new table to be created by the
> normal machinery? Will the normal machinery handle geo columns that
> need to be created by calling a stored procedure?

Yes, you are correct.  There's been some discussion about this; the
(current) conclusion is that each plugin should be responsible for its
own data storage (which may be postgres or anything else) and
shouldn't maintain any constraints to the main database, the concern
being that anything other than the loosest coupling will make things
brittle, hard to test, etc.  Therefore there's no plans to register
for table creation to be created by the normal machinery.

> For augmenting the package display and the search interface, possibly
> including the top navigation bar, do we have a convention for
> overriding just part of the base-system templates? If necessary do we
> break apart the templates as we need now to be able to use genshi to
> override bits of them from plugin packages or do we just try to
> maintain different versions of the same thing in the plugin?

There are developing conventions...

To take recent examples I'm working on, there are two ways of
overriding parts of templates that you may or may not be aware of, but
in the interests of developing documentation...!

1) Using Genshi's match templates with xpath in the layout.html file.
Genshi docs: http://genshi.edgewall.org/wiki/GenshiTutorial#AddingaLayoutTemplate
Example: https://bitbucket.org/okfn/ckanext-datano/src/1aa1c54fdab6/ckanext/datano/theme/templates/layout.html

2) Using a Genshi stream filter plugged in using CKAN's
IGenshiStreamFilter extension point (a stream-based filter / map API
using a subset of xpath to match specific parts of the stream).
Example: https://bitbucket.org/sebbacon/ckanext-googleanalytics/src/dd0f6090c507/ckanext/googleanalytics/plugin.py#cl-37
This example inserts Google Analytics code in the <head>, and also
appends download information to specific resources on a package page.

Note that in the latter example, ideally I should add some ids to the
core templates to make the xpath matching more accurate, or we'll get
some strange, unexpected effects at some point...!

I've found that these two methods have enabled me to do all the
extension-based template customisations I've needed.

Seb