[openspending-dev] Micro-services: OpenSpending's future architecture

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Sun Dec 28 13:09:34 UTC 2014

Hey Tryggvi,

many thanks for your response. I'm really glad that we've shifted into a
discussion on target audiences, that seems to be a useful point and wasn't
mentioned either in the OSEP & previous discussion (this is a large part of
what riles me about them).

Before I reply inline, however, let me be clear about one thing: I didn't
mean to suggest you had a hidden agenda at all, that idea is absurd and
offensive. I apologize for being unclear. What I meant was simply that I'm
not nearly as enthusiastic as you about the purely volunteer-driven,
unfunded model of running OpenSpending, it seems to me that this is less
likely to yield great services than a hybrid model.

Using the term "community-based" for the unfunded model creates an
inappropriate dichotomy, since there is no contradiction between a funded
core and community involvement (cf. GlobaLeaks, TOR, Mozilla, LibreOffice,
Wordpress). At the same time, I was trying to argue that the purely
unfunded model severely limits the ways in which OpenSpending might engage
with potential user groups - specifically that, almost by default, it will
produce a focus on developers as a target audience.

On Sun, Dec 28, 2014 at 1:39 AM, Tryggvi Björgvinsson <
tryggvi.bjorgvinsson at okfn.org> wrote:
> So here's the target audience of OpenSpending as I see it. OpenSpending
> is never going to be an end-user (citizen) facing solution. Budget and
> budget information is consumed via information intermediaries, entities
> that provide context (possibly after analysis) to budget data and by
> doing so help citizens understand it. Citizens (the ultimate target
> group of budget transparency) don't really care that much about the
> contents of the budgets per se, only after it has been provided to them
> with some context+analysis.


These information intermediaries are various organisations, researchers,
> journalists, hackers, etc. Groups that either have access to or can get
> access to developers and data wranglers. OpenSpending is a tool to help
> this group, the information intermediaries do the analysis, get a hold
> of the budgets and do "their jobs" and as such needs to make their work
> easier.

This one is a bit more difficult: if non-technical intermediaries hire
technical staff to build out budget analysis tools, my experience has been
that they will usually re-invent the wheel or bring their own thing (toxic
mix of coder pride and grant funding, is my guess). The key exception to
this is when they get the whole thing for free, which is what the satellite
sites (kind-of) do.

It is probably worth trying to engage these hired technical people more
explicitly. That's partially a comms problem. The intermediary orgs will
usually already have the datasets which they want to analyse, so a data
repository doesn't really add value for them. Having an easy means to turn
that raw data into a rich API, on the other hand, is probably valuable -
stuff like the inflation code that you've added to OS.

That's why I'm unhappy to see all of that analysis stuff as a small fringe
box in the OSEP01 diagram, layered on top of that massive data package
processing wonderland.

> So our number one goal, imo, is to tailor to people who provide the
> budget data and the people who turn that budget data into something
> understandable to the public. As in, really easy to upload stuff and a
> plethora of options to analyse it, preferably in a way that can be
> replicated across datasets (which can save costs for information
> intermediaries).

I think the main difference between our positions is that I would prefer to
put an emphasis on non- and semi-technical intermediaries and analysts,
rather than other developers.

This is partially because I'm not sure how to get over the NIH mindset of
developers, but also because working with domain experts rather than
technical experts will give OS a better position to produce value: if you
implement domain logic, that is much more valuable than data packaged file

My point is that OSEP 01 should be a list of interesting budget analysis
problems to solve, not a discussion of how often we can transform different
types of CSV and JSON into each other. It will also mean that you end up
doing many of these backend things in a more task-focussed way: what
metadata to collect, what processing to perform.

Here's a few:

* Inflation adjustment [done]
* Great, pivot-table-like aggregate analysis (with FTS?)
* Automatic visualisation of the data based on rich semantic descriptions
of it's meaning
* Interactive, crowd-sourced taxonomy alignment for cross-country
* Interactive, crowd-sourced company information reconciliation in
transactional data
* Alignment between transactional and budget data
* Support for more varied fiscal data types, e.g. procurement and
extractives info

> > But I just can't see very much evidence that it actually applies to
> > OpenSpending. The people who provide analytical services in this field
> > - let's name SpendNetwork and OpenGov.com - don't actually need to
> > access our large repository of data (or our APIs). Their customers are
> > cities, and these cities bring their own data (and APIs are easy to
> code).
> Their customers (who they care about) are not the same as our indirect
> user (the citizen) so our approach is different. We're opening up the
> data to provide transparency and understanding for citizens (of course,
> that might be the goal of most cities who use SpendNetwork or
> OpenGov.com or the analytical services themselves, and if that's the
> case, they could also use OpenSpending). We don't restrict ourselves to
> any group, we just focus, imo, on helping information intermediaries to
> provide context and thus understanding to citizens.

But these are the information intermediaries! They are the people trying to
make money (or "create value") with the exact same thing that OpenSpending
is doing, and they don't need OpenSpending to do it. My point was that this
makes us different from OpenStreetMap (where companies like Mapbox,
Foursquare, Apple need and contribute to the data commons) and that we need
to get over this metaphor.

> > This makes OpenSpending unlike OpenStreetMap, and it makes developers
> > an unrealistic and unwilling target audience for the project. I think
> > the budgetary constraints on OpenSpending have lead to a shift in
> > thinking. The discussion you're now having is not what problems need
> > to be solved, but: which ones are cheap to solve. Putting the code for
> > a bunch of APIs on GitHub and storing lots of CSVs on S3 is incredibly
> > cheap, I'm just not sure whose problem it solves.
> Let's make it super clear that the budgetary constraints on OpenSpending
> you are alluding to are that OpenSpending is a community project and
> Open Knowledge pays for its running costs which we are grateful for but
> that's perhaps not how we as a project want it to be run but that's a
> different discussion, so again, don't mix this into the discussion.
> We're not talking about this from a financial perspective.

I maintain that this criticism is valid, but of course it is not targeted
at you. It is targeted at a strategic shift within OKFN which I believe
badly hurts OpenSpending.

> I did add the CSV storage into the micro-services as something new (as
> an addition, not a replacement) not because I just want a solution that
> can store CSV files cheaply because something-something-which
> you-believe-but-I-don't-understand. I added it there because we don't
> have it at the moment. We rely on LINKS ON THE INTERNET if something
> happens to our database which is _not_ a resilient solution. Sure we
> could just hook a downloader into our existing loading mechanism but
> that would just be the same as putting the new underpants over the old
> ones you urinated in. In the end you'll just get burned as badly by the
> pee :-)

Straw man, you know that I want a better file organisation mechanism, too.
I think you're vastly overstating the effort required for a unified source
data storage mechanism, though - the boto code in loadkit is maybe two
dozen lines; let's say this ends up being a hundred when productised. I
have no real idea why all the weird data package stuff is required for data
storage, it'll basically duplicate the metadata we keep in our DB.

Are we (the project) doing the in-context analysis in your
> "non-fictional" suggestion? I don't think we can.I do think we can
> provide context providers with the tools and data they need to do it
> (and I do think the big problem they face is that they don't have that
> access which is what we solve). That might be fiction, so please
> elaborate a little bit about what you're thinking.

Sorry to get back to this, but again: don't you think this "can't" is a
matter of resource? I would like to know what other factors you see as
stopping OpenSpending from addressing end-user (i.e. non-/semi-technical
infomediary) needs.

Note that there still is a user-facing web site, it's just not the most
> important thing of OpenSpending as a project. In my opinion, the web
> site is the gateway into the project, not the project per se.

This goes back to my little rant on the suckiness of open source projects
at web services. Maybe this applies to us?



- Friedrich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20141228/4d48a22c/attachment-0002.html>

More information about the openspending-dev mailing list