[ckan-dev] Harvest cron removal

David Raznick david.raznick at okfn.org
Thu Aug 1 08:10:52 UTC 2013


Hello

For data.gov we lowered the time to 3 mins.
The redis consumer already notices duplicates on the gather queue and is
why we lowered the time.  It could do a better job though of also making
sure it does not readd tasks that are currently running.

The other concern with adding items to the gather stage immediately is that
it is quite common for the harvesters and queue to be on a different server
than the web server.  It is also common for these to have no communication
with each other as all they do is look at the same db.  So whatever happens
here we will have to look at keeping the old behaviour too.

Thanks

David


On Tue, Jul 30, 2013 at 2:24 PM, David Read
<david.read at hackneyworkshop.com>wrote:

> Cheers Ross,
>
> David
>
> On 30 July 2013 13:37, Ross Thompson <ross.thompson.ca at gmail.com> wrote:
> > We (Canada/data.gc.ca) are not using the harvester at the moment, so
> this
> > does not have an impact on us.
> >
> > Thanks.
> >
> >
> >
> > On 30 July 2013 04:56, David Read <david.read at hackneyworkshop.com>
> wrote:
> >>
> >> I'm keen to make the CKAN harvester start immediately when you hit the
> >> 'refresh source' button. Currently it has to wait for the 'harvester
> >> run' cron which is usually configured to occur only every 15 minutes.
> >>
> >> I get plenty of support calls from organisations who are setting up a
> >> harvester, are iterating through problems. They are further frustrated
> >> by this curious wait each time. Since the wait time is somewhat
> >> 'unknown' it also breeds distrust. And we all hate being made to
> >> context-switch away while debugging.
> >>
> >> From conversations with James, the 15 minute cron was originally
> >> designed to ensure that a harvest source couldn't be harvested more
> >> than once simultaneously and get confused. Since the harvest job
> >> becomes multiple gathers and further split into fetches, James was
> >> keen to make sure a source could only have one job for each run of the
> >> cron. BUT I'm sure we could get the backend (Redis if not RabbitMQ) to
> >> tell us the status of the job (and its related gather/fetches) and
> >> stop further ones being created until it is done.
> >>
> >> Would the various CKAN sites using the harvester (OKF, Frauenhofer,
> >> Canada, Washington etc.) be happy about this change? It would be on
> >> the CKAN 2.x code.
> >>
> >> Dave
> >>
> >> _______________________________________________
> >> ckan-dev mailing list
> >> ckan-dev at lists.okfn.org
> >> http://lists.okfn.org/mailman/listinfo/ckan-dev
> >> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> >
> >
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> >
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130801/645d50c5/attachment.html>


More information about the ckan-dev mailing list