[openbiblio-dev] First cut of AsyncUpload branch

Fri Feb 10 09:59:24 UTC 2012

On 10 February 2012 10:05, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> All sounds great. Just to say, in case you are not aware of it, there
> is a standard python library for doing async tasks called celery

Yup, I have used celery on other projects.
It is great, but I felt overkill for what we need right now.

> Celery provides access to its task queue, there's also stuff like:
> <https://github.com/ask/celerymon>

Yes, but that is fairly generic. As soon as you want application
specific things, like: "How far is my download?" or "What went wrong
with the parse?" you have to customise it anyway.

>> - Deciding how to make the ingest pipeline a long-running process.
>> (simple while True: loop?, some form of messaging? polling?)
>
> Celery runs a daemon that takes care of this.

Yep, but you also have to install/maintain a message queue. So more
components/complexity.
(I prefer Redis in combintaion with Celery, even though it is not a
queue per se, but it shines in ease of use, performance and
versatility)

We have started using Supervisor on the bibsoup server for managing
processes, and I was thinking of letting the ingest procedure be
managed by that.

> Sounds brilliant. convert.bibsoup.net would be great. I'm even
> wondering whether our importer should in fact use that i.e.
> import.bibsoup.net is a client of convert.bibsoup.net. Going down that
> line are there any thoughts on the API (should be pretty simple i
> imagine).

Exactly! The exact mechanism/API is not clear yet, but in general this
is where we want to go.