[ckan-dev] harvesting csw

David Read david.read at hackneyworkshop.com
Tue Feb 5 17:02:39 UTC 2013


Armin,

1. On ckanext-spatial there have been some model incompatibilities on
the master branch last week which you may have caught. Update to
latest (c6ac949) and retry. If it is not that, then check your fetch
paster command-line specifies the correct CKAN config file. If still
not working, send us the exception, providing more info.

3500 XML files shouldn't be more than an hour or so, I'd have thought.
Each one requires one request to the GeoNetwork server, store in the
database, retrieved from the database, XML parse and validation and
storing as a CKAN package in the database. The limiting factor is most
likely the CSW get from your server.

2. All changes are tracked, like any wiki, and to enable you to
'undelete' a deleted dataset, so the name is reserved going forward.
Hopefully Options:
* Try doing an undelete (with package_update, I guess)
* Purge the old one and recreate - not possible since purge API has
not been made (shame) http://trac.ckan.org/ticket/1832
* Before deleting the old one, rename it out of the way.

David

On 5 February 2013 07:11, Armin Retterath <armin.retterath at gmail.com> wrote:
> hello list,
>
> first question:
> i'm trying to harvest csw (geonetwork based) into a local ckan 1.8.1b
> instance. i use the master branch of ckan and the master
> ckanext-harvest branch and different branches for ckanext-spatial.
> the harvesting itself seems to make no problems last week on thursday.
> the gathring pulls the right ids and the fetching puts the dataset
> into the db. only some problems with validation occurs.
> when i tried the same yesterday, the fetching queue throws errors
> cause in the ckanext-harvest/ckanext/harvest/queue.py the harvest
> object will not be found! it seems to be a problem of pulling the
> harvest object from database. the fetch_consumer pulls some uuid which
> don't exists any longer (i deleted the table entries in the database
> and reindexed solr!). is there a buggy cache of harvest object uuids
> somewhere which have to be cleared?
>
> the harvesting is very slow. we need to harvest nearly 3500 different
> iso19139 xml files for webmapservices and want to show them in ckan
> ;-) .
>
> second question:
> alternativly i thought about doing the other (better) way and using a
> push to publish datasets from our registry to ckan via the api (2+3).
> the creation and delete seems to possible via the different apis (2
> and 3-action for delete). some further problems exists: if i delete a
> package via action interface the package is not really deleted but the
> attribute state is set to "deteled". when i wan't recreate the object
> with the same name (maybe local uuid of our registry) i get a 403!
> maybe the package is not owned by my user any longer? how can this be
> prohibited? or can i delete the whole package so that i can recreate
> it afterwards? How can i get only those packages that i have created
> by my own via the api?
>
> i think the push way is better to hold the information in sync :-)
>
> thanx in advance
>
> armin
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev




More information about the ckan-dev mailing list