[openspending-dev] OpenSpending - Thoughts on Approach and Architecture
rufus.pollock at okfn.org
Fri Apr 19 16:37:46 UTC 2013
Thanks to folks who commented in the doc. I'd be eager to hear any
additional feedback -- this proposal has non-trivial implications for how
OpenSpending would do stuff :-)
I've also now prepped some notes on the ETL
line with this process and produced this overview diagram:
[image: Inline images 1]
I'd be interested to hear folks thoughts!
On 4 April 2013 21:28, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> I wanted to put down some reflections that distil my understanding
> (and thoughts) on where we are going with our approach and
> [Note: I've also put this in a gdoc version to make annotating /
> commenting easier ].
> Single statement summary:
> We want to centralize data but decentralize "presentation" ("views")
> By “presentation” (views) I mean presentations of that data to people
> in the broadest sense - it could be a visualization and discussion in
> a news article or a dedicated site like Where Does My Money Go.
> To elaborate this a bit, it means:
> 1. OS provides a single central repository of open data on government
> (and corporate) finances
> 2. OS provides good access (APIs, dumps) but quite basic presentation
> of that data (browser, some viz)
> 3. Most of the presentation of that data happens on non-OS sites but
> using OS data (via the API, via dump etc)
> Some of 3 may be done by members of the "OpenSpending" community and
> we care a great deal about 3 (that stuff is the point of having 1+2)
> BUT OS, at least as a technical project, is focused on 1+2.
> This means OpenSpending technically is about:
> - DB: Maintaining that central repository (note this need *not* be a
> classic relational DB - it could be files on s3 or ...)
> - ETL: Providing means to get data into that repository (ETL)
> - API + Dumps: Providing means to get data out of that repository
> - Viz: providing off the shelf visualizations
> - Analytics: providing ways to do analysis on that data
> Note that on Viz and Analytics we would imagine only providing limited
> functionality of the demonstrator or essential kind - there are lots
> of visualizations and analyses that can be done and many ways to do it
> and OS as a technical project will only do a little.
> Aside: analogies with OpenStreetMap. I continue to find analogies with
> OSM incredibly useful. Few people see OSM data or maps via
> openstreetmap.org. Instead they see or use that data in sites or
> products elsewhere (e.g. FourSquare). OSM's core is the central DB,
> the data adding tools and the API/Dumps. Viz even in the form of
> essential things like mapnik and tile production now largely happens
> in other projects that are a part of the community but not OSM "core".
> ## Implications
> There’s more to think through here. These are just some immediate thoughts
> 0. The DB is not necessarily a (relational) DB
> - We need something that we can reliably store into not something
> that does all our analytics too. This could be flat files in s3
> 1. Optimize ETL
> - Getting data in is essential
> - This is about people as much as tools
> - Maximize structure and reliability
> 2. We should not care about OS.org traffic or SEO for normal users.
> What we care about is API usage.
> - We should start measuring API usage asap ...
> 3. Enabling people to build satellite sites or embed viz is our priority
> - We have made huge strides in this direction ... but we can do more
> - E.g. why focus on satellite sites in wordpress
> - Make it easier to get data slices
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openspending-dev