[ckan-dev] CSW Harvester schema invalid

Florian Hoedt florian.hoedt at thuenen.de
Thu Sep 6 13:18:04 UTC 2018


Thanks Ross for pointing in the right direction
after pip installing the dev-requirements.txt the test process started. It showed a lot of schema validation errors. I am not shure why that is an issue, since the iso19391 schema is used by my harvester.
snippet from the production.ini:
# ckan harvester
ckan.harvest.mq.type = redis
ckan.spatial.validator.profiles = iso19139

snippet from the log:
2018-09-06 14:56:05,245 DEBUG [ckanext.spatial.harvesters.csw.CSW.fetch] CswHarvester fetch_stage for object: 3c6cbfa5-765d-4b8b-a9e8-b291e5e8ac3f
2018-09-06 14:56:05,843 INFO  [ckanext.spatial.lib.csw_client] Making CSW request: getrecordbyid [u'urn:uuid:01c9697a-7aea-4620-9d7f-f3be5f5719da-LakeClarity'] {'esn': 'full', 'outputschema': 'http://www.isotc211.org/2005/gmd'}
2018-09-06 14:56:06,250 DEBUG [ckanext.spatial.harvesters.csw.CSW.fetch] XML content saved (len 7452)
2018-09-06 14:56:06,254 DEBUG [ckanext.spatial.harvesters.base.import] Import stage for harvest object: 3c6cbfa5-765d-4b8b-a9e8-b291e5e8ac3f
2018-09-06 14:56:06,256 DEBUG [ckanext.spatial.validation.validation] Starting validation against profile(s) iso19139
2018-09-06 14:56:06,281 INFO  [ckanext.spatial.validation.validation] Validation errors found using schema Dataset schema (gmx.xsd)
2018-09-06 14:56:06,284 INFO  [ckanext.spatial.validation.validation] Validating against "ISO19139 XSD Schema" profile failed
2018-09-06 14:56:06,284 DEBUG [ckanext.spatial.validation.validation] [('Dataset schema (gmx.xsd) Validation Error', None), (u"Element '{http://www.isotc211.org/2005/gco}Date': '' is not a valid value of the union type '{http://www.isotc211.org/2005/gco}Date_Type'.", 13), (u"Element '{http://www.isotc211.org/2005/gmd}MD_DataIdentification', attribute 'id': 'urn:uuid:01c9697a-7aea-4620-9d7f-f3be5f5719da-LakeClarity' is not a valid value of the atomic type 'xs:ID'.", 22), (u"Element '{http://www.isotc211.org/2005/gmd}CI_Citation': Missing child element(s). Expected is one of ( {http://www.isotc211.org/2005/gmd}alternateTitle, {http://www.isotc211.org/2005/gmd}date ).", 24)]
2018-09-06 14:56:06,284 ERROR [ckanext.spatial.harvesters.base] Validation errors found using profile iso19139 for object with GUID urn:uuid:01c9697a-7aea-4620-9d7f-f3be5f5719da-LakeClarity
2018-09-06 14:56:06,287 DEBUG [ckanext.harvest.model] Dataset schema (gmx.xsd) Validation Error
2018-09-06 14:56:06,289 DEBUG [ckanext.harvest.model] Element '{http://www.isotc211.org/2005/gco}Date': '' is not a valid value of the union type '{http://www.isotc211.org/2005/gco}Date_Type'., line 13
2018-09-06 14:56:06,291 DEBUG [ckanext.harvest.model] Element '{http://www.isotc211.org/2005/gmd}MD_DataIdentification', attribute 'id': 'urn:uuid:01c9697a-7aea-4620-9d7f-f3be5f5719da-LakeClarity' is not a valid value of the atomic type 'xs:ID'., line 22
2018-09-06 14:56:06,294 DEBUG [ckanext.harvest.model] Element '{http://www.isotc211.org/2005/gmd}CI_Citation': Missing child element(s). Expected is one of ( {http://www.isotc211.org/2005/gmd}alternateTitle, {http://www.isotc211.org/2005/gmd}date )., line 24


full updated log:
https://gist.github.com/gannebamm/2fb8ac65af9c3193760f64689630a338

It seems like the validation fails. Do I have to install the CSW extension in addition to ckanext-spatial?
http://docs.ckan.org/projects/ckanext-spatial/en/latest/csw.html#ckan-pycsw

For me it seemed like it is just needed if you want to publish CSW style records.

-- 
MSc Florian Hoedt
Koordinator Geoinformation | Coordinator Geoinformatics

Thünen-Institut, Zentrum für Informationsmanagement | Thünen Institute, Centre for Information Management
Bundesallee 44
38116 Braunschweig

Tel:  +49 531 596-1405
Fax:  +49 531 596-1499
Mail: florian.hoedt at thuenen.de
Web:  www.thuenen.de

Das Johann Heinrich von Thünen-Institut, Bundesforschungsinstitut für Ländliche Räume, Wald und Fischerei - kurz: Thünen-Institut - besteht aus 14 Fachinstituten, die in den Bereichen Ökonomie, Ökologie und Technologie forschen und die Politik beraten.

The Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries – Thünen Institute in brief – consists of 14 specialized institutes that carry out research and provide policy advice in the fields of economy, ecology and technology.

----- Ursprüngliche Mail -----
Von: "Ross Jones" <ross at mailbolt.com>
An: "CKAN Development Discussions" <ckan-dev at lists.okfn.org>
Gesendet: Donnerstag, 6. September 2018 12:27:13
Betreff: Re: [ckan-dev] CKAN 2.8 Harvester not working: 'ImportError: No	module named factory'

You probably need to pip install the dev-requirements.txt to run the test

Regards

Ross


> -----Original Message-----
> From: ckan-dev <ckan-dev-bounces at lists.okfn.org> On Behalf Of Florian Hoedt
> Sent: 06 September 2018 11:00
> To: CKAN Development Discussions <ckan-dev at lists.okfn.org>
> Subject: [ckan-dev] CKAN 2.8 Harvester not working: 'ImportError: No module
> named factory'
> 
> Hello List,
> 
> I have read about several issues with CKAN Harvesters and version 2.8. After
> failing to harvest a demo csw instance from pycsw I tried a run_test with an
> Import Error: No module named factory trace:
> '''
> Source id: 53622384-08a7-4372-9cb1-3eafceb70367
>       url:
> http://demo.pycsw.org/services/csw?service=CSW&version=2.0.2&request=Get
> Capabilities
>      type: csw
>    active: True
> frequency: MANUAL
>      jobs: 1
> 
> (default) administrator at GDIT01-BS:/usr/lib/ckan/default/src/ckanext-harvest$
> paster --plugin=ckanext-harvest harvester run_test 53622384-08a7-4372-9cb1-
> 3eafceb70367  --config=/etc/ckan/default/production.ini
> 
> Traceback (most recent call last):
>   File "/usr/lib/ckan/default/bin/paster", line 11, in <module>
>     sys.exit(run())
>   File "/usr/lib/ckan/default/local/lib/python2.7/site-
> packages/paste/script/command.py", line 102, in run
>     invoke(command, command_name, options, args[1:])
>   File "/usr/lib/ckan/default/local/lib/python2.7/site-
> packages/paste/script/command.py", line 141, in invoke
>     exit_code = runner.run(args)
>   File "/usr/lib/ckan/default/local/lib/python2.7/site-
> packages/paste/script/command.py", line 236, in run
>     result = self.command()
>   File "/usr/lib/ckan/default/src/ckanext-
> harvest/ckanext/harvest/commands/harvester.py", line 177, in command
>     self.run_test_harvest()
>   File "/usr/lib/ckan/default/src/ckanext-
> harvest/ckanext/harvest/commands/harvester.py", line 421, in
> run_test_harvest
>     from ckanext.harvest.tests import lib
>   File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/tests/lib.py",
> line 1, in <module>
>     from ckanext.harvest.tests.factories import HarvestSourceObj, HarvestJobObj
>   File "/usr/lib/ckan/default/src/ckanext-
> harvest/ckanext/harvest/tests/factories.py", line 1, in <module>
>     import factory
> ImportError: No module named factory
> '''
> 
> full log:
> https://gist.github.com/gannebamm/2fb8ac65af9c3193760f64689630a338
> 
> seems like some extension is not loaded properly?
> 
> thanks
> Florian
> 
> --
> MSc Florian Hoedt
> Koordinator Geoinformation | Coordinator Geoinformatics
> 
> Thünen-Institut, Zentrum für Informationsmanagement | Thünen Institute,
> Centre for Information Management Bundesallee 44
> 38116 Braunschweig
> 
> Tel:  +49 531 596-1405
> Fax:  +49 531 596-1499
> Mail: florian.hoedt at thuenen.de
> Web:  www.thuenen.de
> 
> Das Johann Heinrich von Thünen-Institut, Bundesforschungsinstitut für
> Ländliche Räume, Wald und Fischerei - kurz: Thünen-Institut - besteht aus 14
> Fachinstituten, die in den Bereichen Ökonomie, Ökologie und Technologie
> forschen und die Politik beraten.
> 
> The Johann Heinrich von Thünen Institute, Federal Research Institute for Rural
> Areas, Forestry and Fisheries – Thünen Institute in brief – consists of 14
> specialized institutes that carry out research and provide policy advice in the
> fields of economy, ecology and technology.
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

_______________________________________________
ckan-dev mailing list
ckan-dev at lists.okfn.org
https://lists.okfn.org/mailman/listinfo/ckan-dev
Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev



More information about the ckan-dev mailing list