[okfn-labs] OpenSpending Architectural Roadmap

Rufus Pollock rufus.pollock at okfn.org
Tue Apr 21 06:38:33 UTC 2015

On 16 April 2015 at 22:24, Tom Morris <tfmorris at gmail.com> wrote:

> I was looking at the new roadmap that was announced yesterday (but
> apparently written 2 years ago and "accepted" by ___? in February).  It's
> hard to argue against more modular with smaller components and less
> coupling, but I'm not sure I understand the relationship between
> openspending.org and data.okfn.org.  Is the big "DataStore" box in the
> middle of the architecture diagram actually a data.okfn.org service?  Is
> http://data.okfn.org/data/openspending/gov-spending-gb-central just a
> bulk download of the data behind openspending.org?

Really good questions Tom. Let me try and explain:

- data.okfn.org is not a data *repository*. It is the home of the
Frictionless Data project and currently includes a *registry* of data
packages at http://data.okfn.org/data/. That registry includes a "viewer"
but it is limited. That registry may evolve (maybe like the node registry)
or maybe not. In addition the registry is basically for any valid data
package. It is definitely not a "DataStore" - storage of the data packages
is somewhere else (github, s3 etc) and up to the creators of that data

- OpenSpending and its DataStore. So the idea is for OpenSpending to start
managing its datasets as (Tabular) Data Packages (or more specifically
Data Packages
These data packaged datasets will live in the OpenSpending DataStore which
will likely just be an s3 bucket plus some access control. In addition, to
being stored, a whole bunch of processing gets done on them (e.g.
aggregations) and we will pull the data out of datasets to create a data
API - see the detailed diagram on OpenSpending Enhancement Proposal 1
course, as the OS datasets are Data Packages they will be viewable with
http://data.okfn.org/tools/view and they may even show up in the registry
but that's secondary (though a nice benefit!)

> Since any data slicing, aggregation, filtering, etc API is going to be 99%
> the same across different data sets, I can't see creating a specific API
> just for spending data.

That's an interesting point. It may turn out that the functionality for
OpenSpending is generic. However, right now it will be focused on the OS
use-case (so we don't get distracted trying to generalise). I note that you
can already push Data Packages to a CKAN DataStore and get an API.

> My mental model is that this would work like an open data / open source
> version of DataMarket (apparently acquired a few months back) with:
> a dataset catalog - https://datamarket.com/data/
> an API which works with any data set - https://datamarket.com/api/v1/

Again an interesting thought. I would point out that CKAN already does this
pretty well - it has both a catalog and a DataStore
<http://docs.ckan.org/en/latest/maintaining/datastore.html> that gives you
a rich and powerful API for structured data (e.g. tabular and geo).

> Is that how the new architecture is intended to work?  If not, are there
> disadvantages to reusing the API and data storage infrastructure for all
> data sets?

As we are only just starting to build the new OpenSpending architecture it
may be too early to say. However, it was really useful to flag this
potential reuse - not something that I'd really thought about.

Please keep the thoughts suggestions coming!


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20150421/26ace620/attachment-0004.html>

More information about the okfn-labs mailing list