[openspending-dev] OpenSpending - Thoughts on Approach and Architecture

Rufus Pollock rufus.pollock at okfn.org
Thu Apr 4 20:28:02 UTC 2013

I wanted to put down some reflections that distil my understanding
(and thoughts) on where we are going with our approach and

[Note: I've also put this in a gdoc version to make annotating /
commenting easier [1]].

Single statement summary:

  We want to centralize data but decentralize "presentation" ("views")

By “presentation” (views) I mean presentations of that data to people
in the broadest sense - it could be a visualization and discussion in
a news article or a dedicated site like Where Does My Money Go.

To elaborate this a bit, it means:

1. OS provides a single central repository of open data on government
(and corporate) finances
2. OS provides good access (APIs, dumps) but quite basic presentation
of that data (browser, some viz)
3. Most of the presentation of that data happens on non-OS sites but
using OS data (via the API, via dump etc)

Some of 3 may be done by members of the "OpenSpending" community and
we care a great deal about 3 (that stuff is the point of having 1+2)
BUT OS, at least as a technical project, is focused on 1+2.

This means OpenSpending technically is about:

- DB: Maintaining that central repository (note this need *not* be a
classic relational DB - it could be files on s3 or ...)
- ETL: Providing means to get data into that repository (ETL)
- API + Dumps: Providing means to get data out of that repository
- Viz: providing off the shelf visualizations
- Analytics: providing ways to do analysis on that data

Note that on Viz and Analytics we would imagine only providing limited
functionality of the demonstrator or essential kind - there are lots
of visualizations and analyses that can be done and many ways to do it
and OS as a technical project will only do a little.

Aside: analogies with OpenStreetMap. I continue to find analogies with
OSM incredibly useful. Few people see OSM data or maps via
openstreetmap.org. Instead they see or use that data in sites or
products elsewhere (e.g. FourSquare). OSM's core is the central DB,
the data adding tools and the API/Dumps. Viz even in the form of
essential things like mapnik and tile production now largely happens
in other projects that are a part of the community but not OSM "core".

## Implications

There’s more to think through here. These are just some immediate thoughts

0. The DB is not necessarily a (relational) DB
  - We need something that we can reliably store into not something
that does all our analytics too. This could be flat files in s3

1. Optimize ETL
  - Getting data in is essential
  - This is about people as much as tools
  - Maximize structure and reliability

2. We should not care about OS.org traffic or SEO for normal users.
What we care about is API usage.
  - We should start measuring API usage asap ...

3. Enabling people to build satellite sites or embed viz is our priority
  - We have made huge strides in this direction ... but we can do more
  - E.g. why focus on satellite sites in wordpress
  - Make it easier to get data slices

[1]: https://docs.google.com/a/okfn.org/document/d/1_NNfpMpAfaIg3eD3bNWNmgj5dmy6O9B2-ebV8h87sh8/edit#

More information about the openspending-dev mailing list