[ckan-dev] ckanext-spatial and pycsw synchronization workflow

Sat Nov 23 14:51:41 UTC 2013

> it would be really useful to have a list of stuff that we would need to generate a valid ISO doc

Seriously.

The problems with the ISO standards are many, but perhaps the most significant is that I'm not sure what you're asking for exists. The XSDs (http://schemas.opengis.net/iso/19139/) define the syntax of the documents, and will tell you things like "the XML doc has to have an <gmd:title> element, and that has to contain a <gco:CharacterString>, but that doesn't mean that the following snippet is not valid:

<gmd:title>
  <gco:CharacterString></gco:CharacterString>
</gmd:title>

On top of that, if you want to read the ISO spec it means buying (seriously!!) three pdf docs, for 19115, 19119 and 19139. Each are prose, rambling on for sometimes hundreds of pages, and providing nothing close to a list of required content.

So: it turns out you can make an XSD-valid, but useless, ISO 19139 metadata record with probably close to no content (nil everything). Now I'm going to backtrack on my last post, but what probably becomes more important is understanding the APISO profile for CSW and what requirements it places on the ISO content through its core queryables and core returnables. 

The first hit I get in Google for "csw core returnables" is actually the output of a validation application that they use at NOAA: http://www.ngdc.noaa.gov/docucomp/page?xml=NOAA/NESDIS/NGDC/MGG/Geology/iso/xml/G00028.xml&view=CSWRubricHTML

While that table isn't fed to us by the OGC or ISO, Ted Habermann at NOAA is definitely a reliable source. I think you could consider these three tables to identify the minimum content that an ISO record in a CSW service should have.

Ryan

________________________________________
From: Adrià Mercader <adria.mercader at okfn.org>
Sent: Friday, November 22, 2013 5:23 AM
To: Tom Kralidis
Cc: Ryan Clark; CKAN Development Discussions
Subject: Re: [ckan-dev] ckanext-spatial and pycsw synchronization workflow

On 22 November 2013 05:48, Tom Kralidis <tomkralidis at gmail.com> wrote:
> On Wed, Nov 20, 2013 at 7:45 PM, Ryan Clark <ryan.clark at azgs.az.gov> wrote:
>> To be clear, is the real goal here ISO-compliant metadata, or is it access to CKAN sites through the CSW API?
>>

Ideally both, although they can happen in different stages. The
feedback we've got from users is that they would be able to create ISO
metadata records from CKAN (lots of portals have some relation to
INSPIRE), so I'd like to see efforts in the CSW front to be based in
ISO if possible. Also all support for spatial harvesting in CKAN is
based around ISO as well.

> Very good question.  Thinking about this more, in GeoNode we took ISO
> as our base model for metadata. In OpenDataCatalog, we used DC. In
> both cases pycsw binds directly against the underlying database,
> stores a full XML document in a column, and the downstream's
> application columns are mapped to pycsw's model by way of a Python
> dict.
One big difference with these catalogs and the thing that makes
difficult to implement a nice solution is that CKAN was not designed
from the start as a catalog for geospatial data with a specific model
in mind. Its model has a set of minimal fields plus arbitrary fields
(extras) with the idea that developers can extend it or adapt it to
their own needs (and people uses all kinds of models). Binding
directly into CKAN's tables would be difficult as the necessary fields
are spread across different tables. I think working at a higher level
with the results returned by the logic layer (basically what you see
on an API call) will make things easier.

> Having said this, Ryan's point clarifies things (a bit, for me at
> least).  I think we should provide DC as the goal of CKAN CSW support
> of local records, the advantage being pycsw runs read-only atop the
> CKAN database.  Harvested records, being in ISO already, can still
> exist and pycsw converts them on the fly to DC if needed.
I understand that this could simplify things, but see above for my
preference for iso. Again, it would be really useful to have a list of
stuff that we would need to generate a valid ISO doc (or DC),
regardless of where would we get it from CKAN, If ISO turned out to be
a huge pain to generate we could use DC and switch to generating ISO
later on, but I'd rather avoid the double work.

> If this is an iteration or two away or needs more thought, we could
> move ahead with the current sync approach and extend it to support
> local records, as a near term (easier) quick win.
>
> Comments?
Sounds great, let's focus on what would we need to generate iso
records from local CKAN records.

Great discussion, thanks

Adrià