[ckan-discuss] Stuck with CSW Harvesting
Angelos Tzotsos
gcpp.kalxas at gmail.com
Wed Sep 26 12:58:38 BST 2012
Hi Adria,
We can schedule a meeting when you get back and start this up.
On the harvesting part, it is fine by me to stick to previous
implementation. Thanks for making things more clear.
Cheers,
Angelos
On 09/26/2012 01:18 PM, Adrià Mercader wrote:
> Hi Angelos,
>
> I'm really glad to hear that you guys are planning to work on CKAN
> integration, in fact we were also planning on starting work on this area at
> some point in the near future. I'm away at a conference the rest of this
> week but I'm happy to meet at some point of the next weeks to get things
> started. Let's speak then and plan the next steps.
>
> Just to be clear, at a first stage we see the pycsw integration as a way to
> have fully fledged CSW interface for CKAN, but the harvesting mechanism
> will still use the current implementation (with the mentioned enhancements
> of course), so the issues mentioned in this thread are not related to the
> future pycsw integration.
>
> Hope this makes it clearer.
>
>
> Adrià
>
>
> On 25 September 2012 18:28, Angelos Tzotsos
> <gcpp.kalxas at gmail.com<javascript:;>>
> wrote:
>> Hi all,
>>
>> I am planning to start working on CKAN-pycsw integration in the near
> future,
>> hopping to end such problems mentioned here.
>> We released pycsw 1.4.0 3 weeks ago and now it is available from pypi as a
>> library (working now with wsgi).
>>
>> I need to tackle the metadata model mapping in order to make it work.
>>
>> Cheers,
>> Angelos
>>
>> PS. https://github.com/geopython/pycsw/issues/73
>>
>>
>>
>> On 09/25/2012 12:29 PM, Adrià Mercader wrote:
>>> Hi Bruce,
>>>
>>> I'm glad to hear that you are exploring using CKAN alongside CSW, this
>>> is something we want to improve and any feedback on this is greatly
>>> appreciated.
>>>
>>> As David already mentioned, the CSW related extensions were written in
>>> the context of the UK Location Project, so the schemas and field model
>>> are based on the Gemini2 profile (in turn based on the INSPIRE
>>> regulations). We are working in making this schemas more generic to
>>> support any ISO 19139 based document. Right now the changes in that
>>> sense are in specific branches that you will need to checkout on a
>>> couple of extensions (Sorry about the slightly different names):
>>>
>>> * ckanext-inspire: git checkout harvest-generic-iso
>>> * ckanext-csw: git checkout generic-iso-support
>>>
>>> This should get rid of the metadata-language field error. Let us know
>>> if there are further errors after this (Make sure to restart the
>>> gather and fetch consumers after checking out the branches)
>>>
>>> BTW we also plan on consolidate all this functionality in a single
>>> extension.
>>>
>>> Hope this helps,
>>>
>>> Adrià
>>>
>>>
>>>
>>>
>>> On 24 September 2012 20:57, Bruce Crevensten <becrevensten at alaska.edu<javascript:;>
>>> wrote:
>>>> Hi,
>>>>
>>>> I'm exploring using CKAN as a companion to GeoNetwork for presenting
>>>> geospatial climate data, and I'm having some difficulty getting CKAN
>>>> to harvest from GeoNetwork's CSW service. Since this thread contained
>>>> a note that was relevant to my situation (specifying the ISO19139
>>>> validator), I'm adding to this thread instead of starting a new one,
>>>> though my issue may be distinct from the original inquiry.
>>>>
>>>> I've installed the ckanext-harvest, ckanext-csw, and ckanext-inspire
>>>> extensions. I'm running CKAN 1.8 on a CentOS6 virtual machine, using
>>>> a source installation. GeoNetwork 2.6.4 is running on a different
>>>> CentOS6 machine. I've not explored the base CKAN install thoroughly,
>>>> but it appears to be stable.
>>>>
>>>> My configuration file (development.ini) has these settings:
>>>>
>>>> ckan.plugins = stats harvest ckan_harvester inspire_api
>>>> gemini_harvester gemini_doc_harvester gemini_waf_harvester
>>>> ckan.inspire.validator.profiles = iso19139
>>>>
>>>> My harvester job is set up to be type 'csw', and the URL endpoint is
>>>> this:
>>>>
> http://athena.snap.uaf.edu:8080/geonetwork/srv/en/csw?request=GetRecordById&service=CSW&version=2.0.2&elementSetName=full&id=4edfbeef-f830-4ce7-b6b1-557592ea8dce
>>>> (Side note: I'm a bit unclear if I'm using the correct URL endpoint.
>>>> That URL specifies a single data record, but the harvester appears to
>>>> correctly discover all of our data sets. ?)
>>>>
>>>> The error messages I'm getting seem to indicate that the fetching is
>>>> working OK, but the gemini profile is being used to validate the
>>>> results, causing validation errors and a failed harvest.
>>>>
>>>> Here's a log excerpt:
>>>>
>>>> 2012-09-24 12:35:53,076 INFO [ckanext.harvest.queue] Received harvest
>>>> object id: 2701000e-a931-4b57-9fa9-5209ef8be1e5
>>>> 2012-09-24 12:35:53,236 INFO [ckanext.csw.services] Making CSW
>>>> request: getrecordbyid [u'e3c2e8ea-0896-4011-b11b-f2f941fec941']
>>>> {'esn': 'full', 'outputschema': 'http://www.isotc211.org/2005/gmd'}
>>>> 2012-09-24 12:35:53,485 DEBUG [ckanext.inspire.harvesters] XML content
>>>> saved (len 24601)
>>>> 2012-09-24 12:35:53,492 ERROR [ckanext.inspire.harvesters] Traceback
>>>> (most recent call last):
>>>> File "/root/ckan/src/ckanext-inspire/ckanext/inspire/harvesters.py",
>>>> line 141, in import_stage
>>>> self.import_gemini_object(harvest_object.content)
>>>> File "/root/ckan/src/ckanext-inspire/ckanext/inspire/harvesters.py",
>>>> line 165, in import_gemini_object
>>>> package =
>>>> self.write_package_from_gemini_string(unicode_gemini_string)
>>>> File "/root/ckan/src/ckanext-inspire/ckanext/inspire/harvesters.py",
>>>> line 174, in write_package_from_gemini_string
>>>> gemini_values = gemini_document.read_values()
>>>> File
>>>> "/root/ckan/src/ckanext-inspire/ckanext/inspire/model/__init__.py",
>>>> line 19, in read_values
>>>> values[element.name] = element.read_value(tree)
>>>> File
>>>> "/root/ckan/src/ckanext-inspire/ckanext/inspire/model/__init__.py",
>>>> line 51, in read_value
>>>> return self.fix_multiplicity(values)
>>>> File
>>>> "/root/ckan/src/ckanext-inspire/ckanext/inspire/model/__init__.py",
>>>> line 102, in fix_multiplicity
>>>> "Value not found for element '%s'" % self.name)
>>>> Exception: Value not found for element 'metadata-language'
>>>> 2012-09-24 12:35:53,494 ERROR [ckanext.inspire.harvesters] Error
>>>> importing Gemini document: Value not found for element
>>>> 'metadata-language'
>>>>
>>>> Is my configuration to import ISO19139 records from GeoNetwork via CSW
>>>> correct, or is there another issue here?
>>>>
>>>> Thanks,
>>>> - Bruce
>>>>
>>>> On Fri, Sep 21, 2012 at 2:24 AM, David Read
>>>> <david.read at hackneyworkshop.com <javascript:;>> wrote:
>>>>> Mauritzio,
>>>>>
>>>>> ckanext-harvest is just the harvesting framework and is useless on its
>>>>> own. The actual harvester for CSW is contained in ckanext-inspire, so
>>>>> you need to install that too.
>>>>>
>>>>> David
>>>>>
>>>>> On 13 September 2012 17:36, Maurizio Napolitano <napo at fbk.eu<javascript:;>>
> wrote:
>>>>>> On 30/07/2012 12:04, Adrià Mercader wrote:
>>>>>>> Hi Simone,
>>>>>>>
>>>>>>> Glad to hear that you are using CKAN for geo-related stuff. We would
>>>>>>> love to hear any feedback that you may have.
>>>>>>>
>>>>>>> In relation to you problem, it looks like you have not loaded the CSW
>>>>>>> harvester extension(s) on your ini file. Can you double check that
> you
>>>>>>> have this added to your ini file?
>>>>>>>
>>>>>>> ckan.plugins = gemini_harvester <your other plugins...>
>>>>>>>
>>>>>>> Also make sure to add this line to your ini file to avoid validating
>>>>>>> the metadata records against the gemini profile (which is UK
>>>>>>> specific):
>>>>>>>
>>>>>>> ckan.inspire.validator.profiles = iso19139
>>>>>>>
>>>>>>> If you do have already defined the harvester in your ini file let me
>>>>>>> know it, as we will need to investigate a little further (try also
>>>>>>> restarting the consumers)
>>>>>>
>>>>>>
>>>>>> Hi Adria',
>>>>>> i used this configuration, and, if i go to
>>>>>> http://myckaninstallation/harvest
>>>>>> i can add a csw server but ... the answer is always
>>>>>>
>>>>>> Last Harvest Errors: 1
>>>>>> Gathering errors
>>>>>>
>>>>>> No harvester could be found for source type csw
>>>>>>
>>>>>> I tested it with some csw services like:
>>>>>> - http://www.pcn.minambiente.it/geoportal/csw
>>>>>> - http://datigis.comune.fi.it/geonetwork/srv/it/csw
>>>>>>
>>>>>> ... and in both cases i obtain this answer
>>>>>>
>>>>>> Where is my error?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> ckan-discuss mailing list
>>>>>> ckan-discuss at lists.okfn.org <javascript:;>
>>>>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>>> _______________________________________________
>>>>> ckan-discuss mailing list
>>>>> ckan-discuss at lists.okfn.org <javascript:;>
>>>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Bruce Crevensten, Web Programmer
>>>> Scenarios Network for Alaska & Arctic Planning
>>>> 3352 College Road, 2nd Floor Denali Building
>>>> Fairbanks, AK 99709
>>>> Phone: 907-474-7134
>>>> Fax: 907-474-7151
>>>> www.snap.uaf.edu
>>>> becrevensten at alaska.edu <javascript:;>
>>>>
>>>> _______________________________________________
>>>> ckan-discuss mailing list
>>>> ckan-discuss at lists.okfn.org <javascript:;>
>>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>> _______________________________________________
>>> ckan-discuss mailing list
>>> ckan-discuss at lists.okfn.org <javascript:;>
>>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
>>>
>>
>> --
>> Angelos Tzotsos
>> Remote Sensing Laboratory
>> National Technical University of Athens
>> http://users.ntua.gr/tzotsos
>>
>>
>>
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org <javascript:;>
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
--
Angelos Tzotsos
Remote Sensing Laboratory
National Technical University of Athens
http://users.ntua.gr/tzotsos
More information about the ckan-discuss
mailing list