[ckan-dev] ckanext-spatial and pycsw synchronization workflow

Tom Kralidis tomkralidis at gmail.com
Wed Nov 20 13:08:46 UTC 2013



On Tue, 19 Nov 2013, Adrià Mercader wrote:

> Date: Tue, 19 Nov 2013 18:05:29 +0000
> From: Adrià Mercader <adria.mercader at okfn.org>
> Reply-To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
> To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
> Subject: Re: [ckan-dev] ckanext-spatial and pycsw synchronization workflow
> 
> Hi Tom,
>
> On 15 November 2013 00:34, Tom Kralidis <tomkralidis at gmail.com> wrote:
>
>> ckanext-spatial docs mention that only harvested documents are made
>> available to pycsw.
>
> Feeding pycsw an ISO document (harvested from eg a CSW server on this
> case) and let her deal with the parsing and storing in the repo seemed
> like the easiest and fastest solution for a first step.
>
>> What is blocking us from modifying
>> https://github.com/okfn/ckanext-spatial/blob/master/bin/ckan_pycsw.py
>> to sync all CKAN metadata, not just harvested metadata.
>
> If I understood correctly (please correct me if wrong) pycsw always
> needs an xml document to add a record to its repository.
>

Correct.


> If that is the case, if there was a reliable way of generating these
> metadata files from a CKAN domain object (eg this [1] in JSON) as you
> suggest later on we could import those as we do with the harvested
> ones already.

In the case/approach of the current CKAN/pycsw integration (synchronization),
agreed.  I've written a generic JSON -> ISO XML jinja2 template, which
can be used given the current approach, where one would render somewhere in
https://github.com/okfn/ckanext-spatial/blob/master/bin/ckan_pycsw.py#L155
so that pycsw always get an ISO XML whether it's local or harvested.

> The challenge will be to do this on a generic enough way that doesn't
> involve users writing a custom template for their case. Ryan Clark
> from the Arizona Geological Survey did a great job in this direction
> using a template (adapted to their own needs) that was filled with
> values from the CKAN dataset.
>
> https://gist.github.com/rclark/5886908
>

To clarify, Ryan's gist is an approach which represents deeper integration,
i.e. no need for synchronization, which is different than the current
CKAN/pycsw integration approach, correct?  I'm guessing this is the desired
target, which removes duplication at the database level? This means pycsw
reads directly from the same, single CKAN database.  This is how it's done
for both GeoNode and OpenDataCatalog projects.

(in this case CSW-T transactions are always turned off, as CKAN proper would
govern insert/update/delete).


> It all basically depends on how easy is to generate meaningful ISO
> documents from CKAN dataset that can be parsed and imported from
> pycsw.
>

We might want to discuss path forward.  Options:


- extend *existing* approach to serialize JSON to ISO XML when
synchronizing local, non-harvested records
- build on Ryan's implementation for deeper integration

>
> As an aside, is there a lower level API on pycsw that allows to create
> records for instance from dict or list of values? This way we will
> avoid having to generate the metadata file and will just need to adapt
> the CKAN dict. (although I'm not sure if you still an actual metadata
> file stored, as I see a "xml" field in the pycsw model)
>
>

In theory, one could create a pycsw record dict like 
pycsw.metadata.parse_record does (https://github.com/geopython/pycsw/blob/master/pycsw/metadata.py#L40)
and do an insert/update accordingly, but we still need an XML document (required
in the model), which serves the CSW GetRecords elementsetname=full use case
(early out, full XML, as opposed to elementsetname=brief|summary).

Having said this, you *could* have the XML field be a virtual field (we never
query on it), or construct it right before insert (e.g. we do this as a
signal in GeoNode/Django).


> Hope this makes sense,
>
> Adrià
>
>
>
>>
>> Is this because we are unable to yield formal metadata (ISO, FGDC) XML
>> from native CKAN local datasets?  If yes, if we had a JSON to ISO XML
>> metadata converter for local datasets only (whereas harvested datasets
>> would already be in a formal metadata standard), would this work?
>>
>
>> ..Tom
>
>
> [1] http://demo.ckan.org/api/3/action/package_show?id=test-preview-geojson
>
>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>



More information about the ckan-dev mailing list