[openbiblio-dev] First cut of AsyncUpload branch

Rufus Pollock rufus.pollock at okfn.org
Fri Feb 10 09:05:20 UTC 2012


On 9 February 2012 20:47, Etienne Posthumus <etienne.posthumus at okfn.org> wrote:
> This is probably only interesting to developers:
>
> The first cut of the
> https://github.com/okfn/bibserver/wiki/AsyncUploadDesign is on the

Brilliant. Your IngestTicket looks just like a standard queue task in
something like celery <http://celeryproject.org/> (see below) --
Celery is the standard python task queue manager. Could be overkill
for what we are doing here.

BTW do we want to use the github wiki or the existing
http://wiki.okfn.org/Projects/BibServer. I don't think it matters too
much but we should probably centralize on one or the other ...

> Guthub repository under the branch:
> https://github.com/okfn/bibserver/tree/asyncupload
> The crux of the 'new ' code is in:
> https://github.com/okfn/bibserver/blob/asyncupload/bibserver/ingest.py
>
> When running a BibServer and a file upload is requested, the upload is
> not immediately performed, but an IngestTicket is created.
> A separate process (currently a command-line: python
> bibserver/ingest.py) is needed to actually do the download of the
> file, and parsing to the index.

All sounds great. Just to say, in case you are not aware of it, there
is a standard python library for doing async tasks called celery
<http://celeryproject.org/> which we have used heavily in other
projects and which I can recommend. If you want to talk about this
more I can connect you Friedrich on the OpenSpending project or David
Raznick on CKAN who have both used it heavily.

> This is also the first cut where the parsers do not run in-line as
> python code of the web server, but are separate 'black-box'
> executables that accept some input format on stdin and outputs
> converted BibJSON on stdout. This means that parsers do not have to be
> written in Python but could be in any language of a potential user who
> has an itch to scratch to get some bibliographic data format
> supported.

Sounds great again (though it may also be useful to have these
callable from python code in a standard manner so they can be used by
celery -- but I assume that is already the case)

> In this branch doing uploads from the local disk do not work. This is
> being worked on.
>
> Next steps:
> - Exposing tickets in a web page, so you can view what is
> pending/progress/history.

Celery provides access to its task queue, there's also stuff like:
<https://github.com/ask/celerymon>

> - Deciding how to make the ingest pipeline a long-running process.
> (simple while True: loop?, some form of messaging? polling?)

Celery runs a daemon that takes care of this.

> - Adding an option to only parse input, and not index it, allowing a
> running BibServer to be a parse 'service' for the locally installed
> formats, from where other BibServer instances could then import the
> parsed BibJSON. This could also function as a convertor for other
> tools that might want to consume BibJSON but are unable to convert it
> themselves.

Sounds brilliant. convert.bibsoup.net would be great. I'm even
wondering whether our importer should in fact use that i.e.
import.bibsoup.net is a client of convert.bibsoup.net. Going down that
line are there any thoughts on the API (should be pretty simple i
imagine).

Rufus

> This feature is not set in stone yet, and on a separate branch.
> If you can read Python code, please take a look and send any feedback.
>
> cheers,
>
> Etienne
>
> _______________________________________________
> openbiblio-dev mailing list
> openbiblio-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openbiblio-dev



-- 
Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/




More information about the openbiblio-dev mailing list