[ckan-dev] Need to call ckanext-harvest "run" twice?

Stefan Oderbolz stefan.oderbolz at liip.ch
Tue Sep 10 22:04:05 UTC 2013


Hi Adrià,

that's exactly what I thought, and the way I always used the commands. I
was just not sure if this is just me accepting an inconvenience or if this
is actually how it's designed, glad to know it's the latter. Thanks for the
clarification!

Regards Stefan


On Tue, Sep 10, 2013 at 11:51 AM, Adrià Mercader <adria.mercader at okfn.org>wrote:

> Hi Stefan,
>
> Yes you are correct, to properly finish a harvest job you need to run
> the "run" command twice.
> This is due to the harvesting process being completely asynchronous,
> as it generally involves big volumes of data.
> The first time you run the "run" command the job will be just sent to
> gather queue and the process will be finished. There are harvest
> objects created for each remote document, which are sent to another
> queue. Each object has a state property and when its import stage is
> finished this is set to "complete" (or "error").
> The only way to know if a job has actually finished is to check if all
> the objects for a particular job have one of these states at some
> point. The "run" command is a good place to do that, as most likely
> you are going to set it up as cron job to run regularly so you can
> check for finished jobs every X minutes.
> When developing is slightly inconvenient because you have to remember
> to run the "run" command again to mark the job as finished and be able
> to create another one (and get the last job details indexed).
>
> There are some scarce docs here, which we intend to improve soon:
>
> https://github.com/okfn/ckanext-harvest#running-the-harvest-jobs
>
> Hope this helps,
>
> Adrià
>
>
> On 9 September 2013 12:44, Stefan Oderbolz <stefan.oderbolz at liip.ch>
> wrote:
> > Hi there,
> >
> > just wanted to know if this is a common practice or if we do something
> > wrong:
> > In all harvesters (custom ones based on ckanext-harvest as well as
> > ckanext-harvest itself) we need to call the paster run command twice in
> > order to mark the harvester as finished. At least only after running the
> > "run" command for the second time, the job is marked as finished in the
> web
> > interface.
> >
> > For me, this means I have to run this command always twice. Is that
> correct?
> > If not, what do I have to do to mark a command as "finished"?
> >
> > Best Regards
> > Stefan
> >
> > --
> > Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
> > Tel +41 43 500 39 80 // GnuPG 0x7B588C67 // www.liip.ch
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
> >
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: http://lists.okfn.org/mailman/options/ckan-dev
>



-- 
Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
Tel +41 43 500 39 80 // GnuPG 0x7B588C67 // www.liip.ch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130911/fb283367/attachment-0001.html>


More information about the ckan-dev mailing list