[ckan-dev] Geonode CWI integration

Tom Kralidis tomkralidis at gmail.com
Wed Mar 12 12:42:25 UTC 2014



On Wed, 12 Mar 2014, Adrià Mercader wrote:

> Date: Wed, 12 Mar 2014 10:46:27 +0000
> From: Adrià Mercader <adria.mercader at okfn.org>
> To: CKAN Development Discussions <ckan-dev at lists.okfn.org>,
>     Tom Kralidis <tomkralidis at gmail.com>
> Subject: Re: [ckan-dev] Geonode CWI integration
> 
> Hi Reinier,
>
> I'm going to go with Philippe and assume you are talking about CSW services :)
>
> The issues regarding the format detection for remote resources came up
> right from the start of working with ISO-based CSW harvesting and have
> been a pain ever since. For all its zilion fields and lengthy
> standard, it's not an easy task to infer them from a harvested ISO
> record.
>

Amen.

> For CKAN to handle the previews correctly we will need to assign a
> correct format to them, eg png to the png images, geojson to geojson
> files, wms to wms endpoints, etc. Now, let's take the example you
> point to, each of the online resources looks like this:
>
>
> <gmd:onLine>
> <gmd:CI_OnlineResource>
> <gmd:linkage>
> <gmd:URL>
> http://maps.nemaug.org/geoserver/wms?layers=geonode%3Awildlifesanctuaries&width=469&bbox=29.84592360669541%2C-0.14586631603160494%2C33.23719845518447%2C3.8276767112473573&service=WMS&format=application%2Fpdf&srs=EPSG%3A4326&request=GetMap&height=550
> </gmd:URL>
> </gmd:linkage>
> <gmd:protocol>
> <gco:CharacterString>WWW:DOWNLOAD-1.0-http--download</gco:CharacterString>
> </gmd:protocol>
> <gmd:name>
> <gco:CharacterString>wildlifesanctuaries.pdf</gco:CharacterString>
> </gmd:name>
> <gmd:description>
> <gco:CharacterString>Wildlife sanctuaries (PDF Format)</gco:CharacterString>
> </gmd:description>
> </gmd:CI_OnlineResource>
> </gmd:onLine>
>
> Name and description contain pdf, and there's also the format
> parameter in the WMS url, but these are there just because we are
> harvesting from a GeoNode instance.

I think gmd:protocol provides the value-added hint to the client.

The fact that gmd:protocol is 'WWW:DOWNLOAD-1.0-http--download' should tell
the client to fetch the resource via HTTP.  The MIME types from the HTTP
response should tell the client what the format is.

In the case above, the fact that 'WWW:DOWNLOAD-1.0-http--download' is used
makes the WMS-ish looking URL in gmd:URL irrelevant.  In this case, WMS just
happens to be the way the download link is realized by the server (this could
also have been a static file, etc.).

Identifying a WMS link as a gmd:URL is a whole other thing, in which case
the gmd:protocol would be something like 'OGC:WMS'.

> The most reliable way we found of guessing what the online resources
> were actually pointing at was trying to guess it from the url and file
> extension, looking for common patterns [1], which is a bit limited as
> your case shows (resources are flagged as wms or wfs and not the
> actual output format).
>
> I'm not sure of the best way to move forward with this, and to what
> extent the excellent work started by Tom and others around catalog
> interop [2] aims to address this (I need to catch up on that).
>
> @Tom could the applicationProfile of CI_OnlineResource be used for
> defining the expected resource format?
>

Note sure, but there is gmd:function which must be one of:

download
information
offlineAccess
order
search

..however (IMHO) this enumeration is a bit outdated (unless a downstream
profile extends it), I would add 'service' or 'api'.

In general, there needs to be a harmonized approach with identifying links
in metadata (ISO or Dublin Core).  So harmonized gmd:protocol values.

With harmonized gmd:protocol values, this should be enough to tip off
the client how to interact with the link:

- direct download (fetch URL, sniff MIME-type)
- website/sort URL for info
- service api (which, depending on the value of gmd:protocol, the client
will bind accordingly)


> We could add some more magic to the logic for guessing the resource
> format so name or format param were taken into account in GeoNode's
> case
>
>
> @Reinier note that you can customize the harvested datasets that will
> be created in CKAN, tweaking the dict that will be sent to the
> create/update functions to manually set formats, remove unwanted ones,
> etc. [3]
> Right now you need to extend the base CSW harvester, but I'm working
> in a couple of extension points right now that should make this
> easier.
>
> Happy to continue the discussion on whatever list/channel

The main challenge here is getting agreement and implementation by CSW
implementations, and getting the word out to metadata content providers.

Can we continue this on Cat-Interop?

GitHub: https://github.com/OSGeo/Cat-Interop
mailing list: http://lists.osgeo.org/cgi-bin/mailman/listinfo/cat-interop

>
> Adrià
>
> [1] https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L58
> [2] https://github.com/OSGeo/Cat-Interop
> [3] http://ckanext-spatial.readthedocs.org/en/latest/harvesters.html#customizing-the-harvesters
>
> On 12 March 2014 09:55, Philippe Duchesne <pduchesne at gmail.com> wrote:
>> Hello Reinier,
>>
>> what do you call CWI ? do you mean CSW ?
>>
>> --p.
>>
>>
>> On Wed, Mar 12, 2014 at 9:37 AM, Reinier Battenberg
>> <reinier.battenberg at mountbatten.net> wrote:
>>>
>>> Hi,
>>>
>>> In our setup ( www.data.ug ) we run geonode for our geospatial data, and
>>> CKAN
>>> for everything + our geospatial data. With a CMS of choice on top of that,
>>> this makes a pretty nice Open Data architecture.
>>>
>>> We are already harvesting 2 geonodes (one is not our own) and solving the
>>> issues that come with that.
>>>
>>> One issue is that the CWI that geonode produces results in pretty useless
>>> resources in CKAN. eg. http://catalog.data.ug/dataset/wildlife-sanctuaries
>>> Note that none of the previews work.
>>>
>>> CWI seems to be a flexible standard that can produce descriptions of
>>> datasets
>>> differently, so  we are having a discussion on the geonode mailinglist as
>>> to
>>> how to change the CWI that geonode produces.
>>>
>>> It would be great if Adria and others who are interested could join the
>>> discussion, so the outcome of the changes to geonode would be an easier
>>> and
>>> slicker integration between these 2 great tools.
>>>
>>> The issue is here:
>>> https://groups.google.com/forum/#!topic/geonode-users/ueNTQ0DWl9Y
>>>



More information about the ckan-dev mailing list