[okfn-labs] Datapackage + Bubbles Demo

Stefan Urbanek stefan.urbanek at gmail.com
Thu Feb 20 00:58:54 UTC 2014


Hi,

Here is a short demo of Bubbles[1] using Data Package collection store:

	https://gist.github.com/Stiivi/9104719

In the Gist you will find the example python code, list of required datasets and their modifications and also stripped example output.

The example is artificial, but at least shows:

* how datapackage store is used – how to access datapackage resources as data objects
* how Pipeline is constructed
* simple master-detail join
* aggregation with composite key

The “Data Package collection store” is a directory with datapackages in it. Data objects are named "PACKAGE.RESOURCE", if package has only one resource then just “PACKAGE".

Note that if the same code was run on top of a SQL database source, then SQL queries (or maybe just one in this case) would be composed and executed instead of Python iterators. Transparently.

My observation during the development: The Data Package and Simple Data Format is great. It just needs a bit refinement and confrontation with real uses (by tools, not human eyes). It needs to focus more on machine-processable and easy-to-use metadata.

Comments, questions and suggestions are welcome.

Enjoy,

Stefan Urbanek

[1] https://github.com/stiivi/bubbles

Twitter: @Stiivi
Personal: stiivi.com
Data Brewery: databrewery.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20140220/89e3d3be/attachment-0003.html>


More information about the okfn-labs mailing list