[ckan-dev] harvesting and ckan geo extensions

James Gardner james at 3aims.com
Wed Apr 6 09:35:29 UTC 2011


Hi all,

On 06/04/11 10:21, David Read wrote:
> 2011/4/6 Adrià Mercader<amercadero at gmail.com>:
>> Hi William and others,
>>
>> El 5 d’abril de 2011 23:14, William Waites<ww at styx.org>  ha escrit:
>>> So far so good, I added some tweaks to the documentation for how to
>>> configure the plugins, and reminders to actually install the package
>>> and put the requirements in the setup.py so that they get installed
>>> comme il faut.
>> Aren't plugins supposed to be separated by spaces in the ini file, not commas?
>>     ckan.plugins = cswserver dgu_form_api harvest
>> instead of
>>     ckan.plugins = cswserver,dgu_form_api,harvest

Yes, we use spaces Will.

>>> In the instructions for the API URL, I may be wrong but I believe that
>>> the convention for ckanclient is to put http://exmaple.org/api not
>>> just http://example.org/ if I am not wrong about this it should
>>> probably be changed for consistency's sake.
>> Not exactly sure what you mean. Do you mean the default value in ckan.api_url?
>> In any case, this will probably get removed in the current refactoring
>> as we won't need
>> to query the api
> Hmm the docs have an example:
>
> ckan.api_url=http://scotdata.ckan.net/api
>
> So maybe Will is talking about a different setting than ckan.api_url?

We want relative URLs here to support DGU temporarily. Please, please, 
please don't go changing things like this at this stage. Once the code 
is stable and deployed for DGU we can look at improvements for wider 
use, but that certainly isn't our main concern 4 weeks before a major 
deadline.

>>> I notice the harvester is adding resources with relative URLs, this
>>> seems to be a pre-existing bug not least because it prevents the
>>> package from being edited because those fields fail validation.
>>> The authentication arrangement in view.py should probably be slackened
>>> a bit, since I don't think we need admin privileges to be able to just
>>> look at a map.
>> Agreed. I think it's made this way in the current context of DGU, but
>> when this gets
>> moved to a new ckanext-georelatedstuff I don't think it will be necessary
> I don't know much about harvesting and its direction, but I put the
> sysadmin authz requirement on all the harvest view interface and
> matching API calls purely because it was easy, it works for DGU right
> now, and it prompts someone to properly plan what authz we do need.
> This might well be a protection object for a harvest source or doc,
> along the lines of everything else. So I think rather than make a
> piecemeal change to remove completely authz on one particular call,
> someone should argue the whole harvesting/geo/csw thing together, in
> light of where it's going.

Again, Adria has implemented exactly what I've asked which is what we 
need to support at the moment, sysadmin privileges are correct. We can 
look at changing after the deadline.

>>> Also if there is a viewable resource, probably we
>>> should have a smaller map without controls on the main package page,
>>> though I understand why you didn't do this straight away as it is a
>>> more invasive template change.
>> That would be nice, but it's a little bit trickier. When dealing with
>> arbitrary WMS servers is very difficult to get a representative
>> snapshot of the
>> maps behind it. In most cases, the user will need to zoom in or out to
>> actually see the maps in context.
>>

Out of scope for the time-being, let's revisit in mid-May.

>>> On the treatment of the SRS in the extras field. I don't really know
>>> why we are putting a big blob of XML in there instead of just using
>>> the well known string identifier in. I think this might be tripping up
>>> the indexing of some datasets, particularly as the UK often uses its
>>> own national grid system very often. There are no particular test for
>>> this, I'll write some once we get some consensus about if we are going
>>> to actually put the SRID in the SRID field or keep the XML blob.
>> I modified that parser a while ago to store the SRID, not the XML. If
>> it's not doing it, it's a bug:
>> https://bitbucket.org/okfn/ckanext-harvest/src/1dd85319a6bf/ckanext/harvest/model/__init__.py#cl-359
>>
>>
>>> On the treatment of the bounding box, I mention this here because I
>>> know that Friedrich and I had discussed this a while back. Probably
>>> having a separate extra for each of the coordinates of the corners, or
>>> 4 extras in all is not as good as having just one BBOX extra. Better
>>> still might be to have an "envelope" extra with WKT in it.
>> +1 To have just one extra field. Currently it just uses the existing
>> code used to parse GEMINI records

Fine with me, but not exactly a high priority. The bounding box is 
always lat/long and the WMSs we need to support are all ETRS 89.

>>> Back to the cosmetic front, it probably would be a good idea to put in
>>> a base layer of vmap0 or something to aid in orientation.
>> It will definitely help. vmap0 is not the nicest base map around, but
>> it's the only one I know in WGS84 (4326) which has a global coverage.

Please, this isn't a priority. Yes, we should look at all these 
improvements, but not until everything we need to deliver for DGU is 
perfect. There's a risk that "improvements" could actually interfere 
since DGU is not exactly a standard case, but its the one we need 
working first.

>>> I guess the geo search and handling of envelope/bbox extras should
>>> really be in a ckanext-geo and not in harvesting since it has nothign
>>> to do with harvesting really, nor with CSW or DGU. That way anything
>>> with that extra would get indexed and displayed.
>> We are indeed planning to move the GEMINI stuff to ckanext-inspire and
>> the spatial search and wms preview to ckanext-geo
>>
>>
>>> All in all, very promising.
>>>
>> Please bear in mind that it's just a preliminary version :)
+1
>> I think that after the refactoring, when all things are where they are
>> supposed to be we will be able to polish all of this details

Exactly, all in good time ;) sorry to put a damner on things but I want 
us to focus on the correct things first, which at the moment is the 
harvesting re-factor to support pluggable harvesting backends and the queue.

Cheers,

James





More information about the ckan-dev mailing list