[openbiblio-dev] First cut of AsyncUpload branch

Etienne Posthumus etienne.posthumus at okfn.org
Thu Feb 9 20:47:48 UTC 2012


This is probably only interesting to developers:

The first cut of the
https://github.com/okfn/bibserver/wiki/AsyncUploadDesign is on the
Guthub repository under the branch:
https://github.com/okfn/bibserver/tree/asyncupload
The crux of the 'new ' code is in:
https://github.com/okfn/bibserver/blob/asyncupload/bibserver/ingest.py

When running a BibServer and a file upload is requested, the upload is
not immediately performed, but an IngestTicket is created.
A separate process (currently a command-line: python
bibserver/ingest.py) is needed to actually do the download of the
file, and parsing to the index.

This is also the first cut where the parsers do not run in-line as
python code of the web server, but are separate 'black-box'
executables that accept some input format on stdin and outputs
converted BibJSON on stdout. This means that parsers do not have to be
written in Python but could be in any language of a potential user who
has an itch to scratch to get some bibliographic data format
supported.

In this branch doing uploads from the local disk do not work. This is
being worked on.

Next steps:
- Exposing tickets in a web page, so you can view what is
pending/progress/history.
- Deciding how to make the ingest pipeline a long-running process.
(simple while True: loop?, some form of messaging? polling?)
- Adding an option to only parse input, and not index it, allowing a
running BibServer to be a parse 'service' for the locally installed
formats, from where other BibServer instances could then import the
parsed BibJSON. This could also function as a convertor for other
tools that might want to consume BibJSON but are unable to convert it
themselves.

This feature is not set in stone yet, and on a separate branch.
If you can read Python code, please take a look and send any feedback.

cheers,

Etienne




More information about the openbiblio-dev mailing list