[ckan-discuss] Harvester questions/issues

Stéphane Guidoin stephane at opennorth.ca
Tue Oct 29 16:18:39 GMT 2013


Hello,

I am currently evaluating how to automate the maintenance of datasets for a
CKAN implementation (City of Montréal).

I am looking at the harvester and I am not able to have it working.

So I installed both CKAN and CSW harvesters following the only guide (
https://github.com/okfn/ckanext-harvest) using RabbitMQ.

I am able to access http://my_instance:5000/harvest, the harvester CLI
basic commands work (I am able to add sources, list them, etc.)

In order to run a manual job, I did the following:

Add the source (CSW):
paster --plugin=ckanext-harvest harvester source mysource "
http://www.donnees.gouv.qc.ca/geonetwork/srv/eng/csw?SERVICE=CSW&request=GetCapabilities&AcceptVersion=2.0.2"
csw "My Source" --config=/etc/ckan/default/production.ini

Add a job for this source
paster --plugin=ckanext-harvest harvester job
01aa2038-678b-4ce5-972e-ad4eaf9198f6
--config=/etc/ckan/default/production.ini

(If I ask for the list of jobs, this one appears as "new")

Then, in different consoles, I start the gather and the fetch:
paster --plugin=ckanext-harvest harvester gather_consumer
--config=/etc/ckan/default/production.ini
paster --plugin=ckanext-harvest harvester fetch_consumer
--config=/etc/ckan/default/production.ini

And in a third console, I launch the run command:
paster --plugin=ckanext-harvest harvester run
--config=/etc/ckan/default/production.ini

After this, I receive a message telling me the job has been sent to the
gather queue
2013-10-28 14:13:40,550 INFO  [ckanext.harvest.logic.action.update] Sent
job 97bd1013-9160-4359-9b5a-243fbd8bc30a to the gather queue

But nothing happen, the gather consumer does nothing, no process runs even
if I leave this running for hours. The job appears as running.

If I try to delete the job (purge_queue), the job remains there running, if
I add the job again, it tells me there is already a job for that source,
but if I try to start the "run" again, it tells me there no new job.

How is it suppposed to behave? Did I miss something? Any idea if/where I
could find relevant logs (RabbitMQ's log in /var/log/rabbitmq) does not
show anything...

Slightly related question: what is the URL that should be configurer for
bpth CKAN and CSW harvester? For CKAN should I link the CKAN root page or
directly the API. For CSW, do I point the basic CSW resource of the
"GetCapacilities"?

Thank you

Steph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20131029/9ce99e29/attachment.htm>


More information about the ckan-discuss mailing list