[openspending-dev] Overview Diagram for OpenSpending Tech Work

Rufus Pollock rufus.pollock at okfn.org
Wed Oct 22 09:19:15 UTC 2014

On 20 October 2014 09:28, Friedrich Lindenberg <friedrich at pudo.org> wrote:

> Hey Rufus,
> nice diagram! Basically, the main change seems to be the data store of
> flat-file budget data packages? Just for fun and tradition’s sake, let me
> play the devil’s advocate :)

I don't think that was the main change indicated there (and its not one
specifically covered in the diagram - though it is in other context e.g.
OSEP 2). The diagram was more an attempt to clarify what bits of the tech
stack we think OpenSpending "Core" should be focusing on (and standardizing

> While I understand that a repository of uniformly formatted datasets is
> kind of a cool asset, I wonder if this actually lies on any relevant user
> path? What is there to prevent this from becoming yet another data
> catalogue with little or no use cases?
> As it stands right now, I see the following main challenges for
> OpenSpending:

This seem good though I'd have some other points and different emphasis.
However, I don't want to get us off-topic from main part of the thread.

> (1) Operating the platform without an explicit revenue stream from
> commercial activity or providing a way for OpenSpending apps (e.g. bubbles)
> to run without the core platform.
> (2) Keeping the data up-to-date: I don’t want to find out who uploaded
> which dataset when and with which budget document, I just want to have
> access to the latest budget data for my country.
> (3) Developing new visualisations and modes of analysis (e.g. comparisons
> between city budgets, budgets over time).
> The diagram color-codes (3) to be someone else’s problem, which is fair
> enough. The budget data spec *could* play a significant role in this, by

It color-codes it as *primarily* outside of OpenSpending tech team yes. I
imagine some work is done here and clearly strong connection with APIs that
are provided etc.

> aligning the used classifications schemes across a set of budgets. I’m not
> sure whether this is intended to be addressed in the “Data Package
> Creation” API, but my guess is that this would really be a service on its
> own (I think Mark Brough is building something like this for aid sector
> spines). It would certainly be a great way of adding value to the data
> stored in OpenSpending.

I agree and we already have something like this -
https://github.com/openspending/bdp-uploader (plus integration of budget
package creation into CKAN being used by real users). Personally, I'd like
to see at least one service with good integration into OS DataStore so we
have good UX but I'm happy to see this happening generally.

> (2) is also something where BDP metadata could be useful, but I’d be
> surprised if a well-sorted s3 bucket was really all that was needed to
> solve this challenge. It seems to me that this is really more of a
> community management issue, where

Aren't we straw-manning a little bit here ;-): the s3- as-datastore has
never been intended to solve e.g. the problem of keeping data up to date
(and never has). as you say it is community and data scraping issue. I *do*
think that the s3 datastore can make this a bit easier in various ways but
they are minor compared to actually writing the scrapers.

I should also reiterate that the s3-for-datastore is a separate question to
the main purpose of the diagram so we could perhaps we should boot a
separate thread for this specific discussion - which seems valuable and
important ...

we need people not to duplicate work, publish recipes for data extraction,
> have a clear schedule to supply updates and a review process etc. etc. In
> any case, I don’t feel like we should try to approach it as a file system
> problem, but instead think about the types of processes necessary to get it
> done.


> (1) is the hardest part, but also the most urgent. The data store doesn’t
> worsen this problem (running it would probably be heroku-level/free), but
> the proposed architecture also doesn’t do away with the need for the
> expensive bits: the API and search system.

Agreed, but by breaking things up a bit we make it much easier for folks to
work on different bits (both because they are smaller and more independent).

> I’ve argued before that we probably wouldn’t loose much over just turning
> off the FTS index (perhaps with a grace period for DGU). As for the API,
> the question for me is whether we can flat-file the aggregator output in
> some standard way. The BDP can help here, but there would still need to be
> some sort of drilldown generator UI that builds a data package into an
> in-memory OLAP cube, generate lots of JSON snippets on S3 and then go drink.

These are definitely the right questions to ask. At the moment I don't know
the answer - can we "cache" effectively (ie. flatfile), do we still use our
current postgres OLAP, do try out bigquery or redshift or ...

The point of the diagram was simply to make this a specific, distinct
component and say that OS should be responsible for a good bit of this.
Next step would be to boot a doc and do a first pass and what we think our
first effort should be (which could be using what we currently have)

> If you plan to re-arrange OpenSpending around the BDP, you need to demo
> that it is actually useful in solving these types of challenges - proving
> that you can encode data in this format alone will not sell it :)

"re-arrange around the BDP" seems a bit strong to be me in terms of where
we should go. I think we want BDP to be one component of the solution but
there's (lots of) other stuff. Also check out OSEP 4 about the things we
may want to add to a BDP to be useful in OS.

> In any case, I guess my main point is that I would find it more helpful to
> discuss a diagram of user activities rather than this one where all the
> verbs are on the fringes (Acquisition, Clean, Load, Analysis, Presentation)
> and the center is all nouns (API, DataStore, Write UI) :)

That's a good point - we could do a different diagram done that way round.
The source for the current diagram is in the open tech team directory
<https://drive.google.com/a/okfn.org/#folders/0B6R8dXc6Ji4JSGFVSElLM1RJdzQ> so
we could boot a new diagram in there (or fork the current one and show
processes rather than products).

> Don’t take this the wrong way, it’s meant with the best of intentions and
> I’m very excited to see BDP becoming an important tool!

:-) ++

this discussion is super useful - let's keep it going.

I should reiterate that whilst I think BDP is super-useful it is only one
component of the solution :-)



> - Friedrich
> On 15 Oct 2014, at 21:27, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> Hi All,
> Based on discussion with Tryggvi here is an overview of OpenSpending tech
> work that summarises some of the points in OSEP 1 and OSEP 2. Would be
> great to get people's thoughts.
> I've also booted an issue in OSEP repo which may be best place to discuss:
> https://github.com/openspending/osep/issues/2
> Rufus
>  _______________________________________________
> openspending-dev mailing list
> openspending-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/openspending-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/openspending-dev


*Rufus PollockFounder and President | skype: rufuspollock | @rufuspollock
<https://twitter.com/rufuspollock>Open Knowledge <http://okfn.org/> - see
how data can change the world**http://okfn.org/ <http://okfn.org/> | @okfn
<http://twitter.com/OKFN> | Open Knowledge on Facebook
<https://www.facebook.com/OKFNetwork> |  Blog <http://blog.okfn.org/>*

The Open Knowledge Foundation is a not-for-profit organisation.  It is
incorporated in England & Wales as a company limited by guarantee, with
company number 05133759.  VAT Registration № GB 984404989. Registered
office address: Open Knowledge Foundation, St John’s Innovation Centre,
Cowley Road, Cambridge, CB4 0WS, UK.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20141022/23581340/attachment-0002.html>

More information about the openspending-dev mailing list