[ckan-dev] harvesting csw

Armin Retterath armin.retterath at gmail.com
Tue Feb 5 07:11:40 UTC 2013


hello list,

first question:
i'm trying to harvest csw (geonetwork based) into a local ckan 1.8.1b
instance. i use the master branch of ckan and the master
ckanext-harvest branch and different branches for ckanext-spatial.
the harvesting itself seems to make no problems last week on thursday.
the gathring pulls the right ids and the fetching puts the dataset
into the db. only some problems with validation occurs.
when i tried the same yesterday, the fetching queue throws errors
cause in the ckanext-harvest/ckanext/harvest/queue.py the harvest
object will not be found! it seems to be a problem of pulling the
harvest object from database. the fetch_consumer pulls some uuid which
don't exists any longer (i deleted the table entries in the database
and reindexed solr!). is there a buggy cache of harvest object uuids
somewhere which have to be cleared?

the harvesting is very slow. we need to harvest nearly 3500 different
iso19139 xml files for webmapservices and want to show them in ckan
;-) .

second question:
alternativly i thought about doing the other (better) way and using a
push to publish datasets from our registry to ckan via the api (2+3).
the creation and delete seems to possible via the different apis (2
and 3-action for delete). some further problems exists: if i delete a
package via action interface the package is not really deleted but the
attribute state is set to "deteled". when i wan't recreate the object
with the same name (maybe local uuid of our registry) i get a 403!
maybe the package is not owned by my user any longer? how can this be
prohibited? or can i delete the whole package so that i can recreate
it afterwards? How can i get only those packages that i have created
by my own via the api?

i think the push way is better to hold the information in sync :-)

thanx in advance

armin




More information about the ckan-dev mailing list