[okfn-labs] Working with Bubbles for a data processing pipeline

Paul Walsh paulywalsh at gmail.com
Sun Dec 28 21:44:16 UTC 2014


This is mostly targeted at Stefan, creator of Bubbles (http://pythonhosted.org/bubbles/index.html <http://pythonhosted.org/bubbles/index.html>), but posting to list for other input/general knowledge.

I’ve had a brief discussion with Rufus about Bubbles, and a look over the the documentation. I’m wondering how I could integrate it in my current work on CSV validation, but some areas of the docs are empty or light. At this stage, in theory, it looks like it could provide a lot of the backend for what we want to do, which in summary, is:

Take a data source (CSV) and stream it through a validation pipeline:
    - check for structural issues (missing/extra columns and rows, etc.)
    - check validation against a json schema
    - transform the stream in the pipeline (e.g.: remove empty rows before running the JTS validation)
        - (save the original source and transformed source to file)
    - generate reports for each “component” in the pipeline
        - compile the reports from each component into a “master” report

The API will be usable via Python for integration with existing codebases. Additionally, it will be wrapped as an HTTP service.

The Bubbles docs cover several relevant areas we’d need (e.g.: http://pythonhosted.org/bubbles/operations.html#field-operations), and there is a small section on custom operations, but it is not entirely clear (to me) how I might use Bubbles to construct our whole pipeline and generate reports in a format of our choosing.

An important (deal-breaking) issue is Python 2.7 support: 

I’m currently writing in Python 3, but aim to target 2/3 (we **need** to integrate with Python 2.7 codebases). It is pretty clear that Bubbles is Py3 only, but we thought we’d ask anyway about this, and what it might take to get Bubbles working on 2.7.x.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20141229/edfb35d9/attachment-0003.html>


More information about the okfn-labs mailing list