[ckan-discuss] Stuck with CSW Harvesting

David Read david.read at hackneyworkshop.com
Mon Sep 24 22:22:15 BST 2012


Bruce,

This all sounds promising! Great to hear these being reused by you too.

These CKAN extensions were written for the UK Location Programme,
which uses Gemini2 which I understand is a variant on ISO19139. It has
three stages of validation, which includes ISO19139 and 'schematron',
which is specifically for Gemini2.

The specific error is in extracting the language field, which sounds
like it is mandatory for Gemini but not ISO. So the extension would
need to be changed according to allow for this. For this case you
could change the error into a warning, and store a blank value in
CKAN. I expect there will be some other values like this that can
become optional. If you're happy to make the changes on a fork, I'll
happily review and merge them in for the benefit for others.

You'll also want to change the schemas that are used to validate, to
avoid the Gemini2 ones.

I expect Adria on this list could help with details further since he
has the best knowledge of it all.

David

On 24 September 2012 20:57, Bruce Crevensten <becrevensten at alaska.edu> wrote:
> Hi,
>
> I'm exploring using CKAN as a companion to GeoNetwork for presenting
> geospatial climate data, and I'm having some difficulty getting CKAN
> to harvest from GeoNetwork's CSW service.  Since this thread contained
> a note that was relevant to my situation (specifying the ISO19139
> validator), I'm adding to this thread instead of starting a new one,
> though my issue may be distinct from the original inquiry.
>
> I've installed the ckanext-harvest, ckanext-csw, and ckanext-inspire
> extensions.  I'm running CKAN 1.8 on a CentOS6 virtual machine, using
> a source installation.  GeoNetwork 2.6.4 is running on a different
> CentOS6 machine.  I've not explored the base CKAN install thoroughly,
> but it appears to be stable.
>
> My configuration file (development.ini) has these settings:
>
> ckan.plugins = stats harvest ckan_harvester inspire_api
> gemini_harvester gemini_doc_harvester gemini_waf_harvester
> ckan.inspire.validator.profiles = iso19139
>
> My harvester job is set up to be type 'csw', and the URL endpoint is
> this: http://athena.snap.uaf.edu:8080/geonetwork/srv/en/csw?request=GetRecordById&service=CSW&version=2.0.2&elementSetName=full&id=4edfbeef-f830-4ce7-b6b1-557592ea8dce
>
> (Side note: I'm a bit unclear if I'm using the correct URL endpoint.
> That URL specifies a single data record, but the harvester appears to
> correctly discover all of our data sets.  ?)
>
> The error messages I'm getting seem to indicate that the fetching is
> working OK, but the gemini profile is being used to validate the
> results, causing validation errors and a failed harvest.
>
> Here's a log excerpt:
>
> 2012-09-24 12:35:53,076 INFO  [ckanext.harvest.queue] Received harvest
> object id: 2701000e-a931-4b57-9fa9-5209ef8be1e5
> 2012-09-24 12:35:53,236 INFO  [ckanext.csw.services] Making CSW
> request: getrecordbyid [u'e3c2e8ea-0896-4011-b11b-f2f941fec941']
> {'esn': 'full', 'outputschema': 'http://www.isotc211.org/2005/gmd'}
> 2012-09-24 12:35:53,485 DEBUG [ckanext.inspire.harvesters] XML content
> saved (len 24601)
> 2012-09-24 12:35:53,492 ERROR [ckanext.inspire.harvesters] Traceback
> (most recent call last):
>   File "/root/ckan/src/ckanext-inspire/ckanext/inspire/harvesters.py",
> line 141, in import_stage
>     self.import_gemini_object(harvest_object.content)
>   File "/root/ckan/src/ckanext-inspire/ckanext/inspire/harvesters.py",
> line 165, in import_gemini_object
>     package = self.write_package_from_gemini_string(unicode_gemini_string)
>   File "/root/ckan/src/ckanext-inspire/ckanext/inspire/harvesters.py",
> line 174, in write_package_from_gemini_string
>     gemini_values = gemini_document.read_values()
>   File "/root/ckan/src/ckanext-inspire/ckanext/inspire/model/__init__.py",
> line 19, in read_values
>     values[element.name] = element.read_value(tree)
>   File "/root/ckan/src/ckanext-inspire/ckanext/inspire/model/__init__.py",
> line 51, in read_value
>     return self.fix_multiplicity(values)
>   File "/root/ckan/src/ckanext-inspire/ckanext/inspire/model/__init__.py",
> line 102, in fix_multiplicity
>     "Value not found for element '%s'" % self.name)
> Exception: Value not found for element 'metadata-language'
> 2012-09-24 12:35:53,494 ERROR [ckanext.inspire.harvesters] Error
> importing Gemini document: Value not found for element
> 'metadata-language'
>
> Is my configuration to import ISO19139 records from GeoNetwork via CSW
> correct, or is there another issue here?
>
> Thanks,
> - Bruce
>
> On Fri, Sep 21, 2012 at 2:24 AM, David Read
> <david.read at hackneyworkshop.com> wrote:
>>
>> Mauritzio,
>>
>> ckanext-harvest is just the harvesting framework and is useless on its
>> own. The actual harvester for CSW is contained in ckanext-inspire, so
>> you need to install that too.
>>
>> David
>>
>> On 13 September 2012 17:36, Maurizio Napolitano <napo at fbk.eu> wrote:
>> > On 30/07/2012 12:04, Adrià Mercader wrote:
>> >>
>> >> Hi Simone,
>> >>
>> >> Glad to hear that you are using CKAN for geo-related stuff. We would
>> >> love to hear any feedback that you may have.
>> >>
>> >> In relation to you problem, it looks like you have not loaded the CSW
>> >> harvester extension(s) on your ini file. Can you double check that you
>> >> have this added to your ini file?
>> >>
>> >> ckan.plugins = gemini_harvester <your other plugins...>
>> >>
>> >> Also make sure to add this line to your ini file to avoid validating
>> >> the metadata records against the gemini profile (which is UK
>> >> specific):
>> >>
>> >> ckan.inspire.validator.profiles = iso19139
>> >>
>> >> If you do have already defined the harvester in your ini file let me
>> >> know it, as we will need to investigate a little further (try also
>> >> restarting the consumers)
>> >
>> >
>> >
>> > Hi Adria',
>> > i used this configuration, and, if i go to
>> > http://myckaninstallation/harvest
>> > i can add a csw server but ... the answer is always
>> >
>> > Last Harvest Errors: 1
>> > Gathering errors
>> >
>> >     No harvester could be found for source type csw
>> >
>> > I tested it with some csw services like:
>> > - http://www.pcn.minambiente.it/geoportal/csw
>> > - http://datigis.comune.fi.it/geonetwork/srv/it/csw
>> >
>> > ... and in both cases i obtain this answer
>> >
>> > Where is my error?
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > ckan-discuss mailing list
>> > ckan-discuss at lists.okfn.org
>> > http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>
>
>
>
> --
>
> Bruce Crevensten, Web Programmer
> Scenarios Network for Alaska & Arctic Planning
> 3352 College Road, 2nd Floor Denali Building
> Fairbanks, AK 99709
> Phone: 907-474-7134
> Fax: 907-474-7151
> www.snap.uaf.edu
> becrevensten at alaska.edu



More information about the ckan-discuss mailing list