[ckan-dev] Harvest cron removal

David Read david.read at hackneyworkshop.com
Tue Jul 30 08:56:58 UTC 2013


I'm keen to make the CKAN harvester start immediately when you hit the
'refresh source' button. Currently it has to wait for the 'harvester
run' cron which is usually configured to occur only every 15 minutes.

I get plenty of support calls from organisations who are setting up a
harvester, are iterating through problems. They are further frustrated
by this curious wait each time. Since the wait time is somewhat
'unknown' it also breeds distrust. And we all hate being made to
context-switch away while debugging.

>From conversations with James, the 15 minute cron was originally
designed to ensure that a harvest source couldn't be harvested more
than once simultaneously and get confused. Since the harvest job
becomes multiple gathers and further split into fetches, James was
keen to make sure a source could only have one job for each run of the
cron. BUT I'm sure we could get the backend (Redis if not RabbitMQ) to
tell us the status of the job (and its related gather/fetches) and
stop further ones being created until it is done.

Would the various CKAN sites using the harvester (OKF, Frauenhofer,
Canada, Washington etc.) be happy about this change? It would be on
the CKAN 2.x code.

Dave




More information about the ckan-dev mailing list