[ckan-dev] ckanext-spatial and pycsw synchronization workflow

Fri Nov 29 20:31:50 UTC 2013

On Fri, Nov 22, 2013 at 7:23 AM, Adrià Mercader <adria.mercader at okfn.org> wrote:
> On 22 November 2013 05:48, Tom Kralidis <tomkralidis at gmail.com> wrote:
>> On Wed, Nov 20, 2013 at 7:45 PM, Ryan Clark <ryan.clark at azgs.az.gov> wrote:
>>> To be clear, is the real goal here ISO-compliant metadata, or is it access to CKAN sites through the CSW API?
>>>
>
> Ideally both, although they can happen in different stages. The
> feedback we've got from users is that they would be able to create ISO
> metadata records from CKAN (lots of portals have some relation to
> INSPIRE), so I'd like to see efforts in the CSW front to be based in
> ISO if possible. Also all support for spatial harvesting in CKAN is
> based around ISO as well.
>

This would involve efforts in outside pycsw i.e. expanding the CKAN
model to provide a useful ISO document which pycsw can then grok. Are
there timelines or efforts existing in this direction?

Implementing a DC JSON->XML template would provide benefits of (quick
win) exposing local CKAN records via CSW, and provide
opportunities/lessons learned (and a stable approach) for when ISO
JSON->XML efforts materialize, making efforts in this direction
evolutionary (not revolutionary).  Keeping mind CSW itself baselines
on csw:Record (http://schemas.opengis.net/csw/2.0.2/record.xsd), so
this would provide added benefit of federated search across
catalogues, whether they implement APISO or not.

>
>> Very good question.  Thinking about this more, in GeoNode we took ISO
>> as our base model for metadata. In OpenDataCatalog, we used DC. In
>> both cases pycsw binds directly against the underlying database,
>> stores a full XML document in a column, and the downstream's
>> application columns are mapped to pycsw's model by way of a Python
>> dict.
> One big difference with these catalogs and the thing that makes
> difficult to implement a nice solution is that CKAN was not designed
> from the start as a catalog for geospatial data with a specific model
> in mind. Its model has a set of minimal fields plus arbitrary fields
> (extras) with the idea that developers can extend it or adapt it to
> their own needs (and people uses all kinds of models).

Agreed.  This makes DC JSON->XML initial effort a fast/light/simple iteration.

> Binding directly into CKAN's tables would be difficult as the necessary fields
> are spread across different tables. I think working at a higher level
> with the results returned by the logic layer (basically what you see
> on an API call) will make things easier.
>
>
>> Having said this, Ryan's point clarifies things (a bit, for me at
>> least).  I think we should provide DC as the goal of CKAN CSW support
>> of local records, the advantage being pycsw runs read-only atop the
>> CKAN database.  Harvested records, being in ISO already, can still
>> exist and pycsw converts them on the fly to DC if needed.
> I understand that this could simplify things, but see above for my
> preference for iso. Again, it would be really useful to have a list of
> stuff that we would need to generate a valid ISO doc (or DC),
> regardless of where would we get it from CKAN, If ISO turned out to be
> a huge pain to generate we could use DC and switch to generating ISO
> later on, but I'd rather avoid the double work.
>
>
>> If this is an iteration or two away or needs more thought, we could
>> move ahead with the current sync approach and extend it to support
>> local records, as a near term (easier) quick win.
>>
>> Comments?
> Sounds great, let's focus on what would we need to generate iso
> records from local CKAN records.
>
> Great discussion, thanks
>
> Adrià