[openspending-dev] OpenSpending - Thoughts on Approach and Architecture

Fri May 3 23:35:37 UTC 2013

On fös 3.maí 2013 17:24, Rufus Pollock wrote:

> I think I'm a bit confused by the term "publishing". My point is that
> OS here is *not* *directly* about helping users analyse and publish
> their results - just as OpenStreetMap is not directly about helping
> users analyse and publish geodata. Rather it is about creating a
> consolidated *open* database of information that others can easily
> contribute to and use (to do analysis and presentation).

By publishing I mean putting the results out their, either via satellite
sites that focus on a subset of our datasets (like a specific country,
city etc.) or in a piece of news where journalists explain or raise
concerns about something.

I don't think that it's our job to create those satellite sites or write
those stories but we should try to make it as easy as we can for others
to do it. This we do by providing them with tools to work with the data,
to dive into the data etc. We can't really offer them a "download the
dataset for your country here" button.

The difference between OpenSpending and OpenStreetMap is that people can
add and improve data. In OpenSpending you only add static data.
OpenSpending doesn't allow users to do any data wrangling
(modifications). It's like OpenStreetMap where you can only add new
countries but not update them (well not quite since amount of countries
is limited but amount of fiscal years).

Why do we provide the OLAP cube or the visualisations if we only want to
be a database of spending data? I know the OLAP cube provides a way to
get the data out efficiently but I'd say that for your idea that it's
the wrong approach. We should focus more on trying to standardise the
data structure and then provide simple ways to fetch data via that
standardisation (if we continue with the OpenStreetMap analogy - the
standardisation would be about marking/annotating the data just like you
can mark highways in OpenStreeMap).

> Hmmm. I think I was trying to say something a bit different:
> specifically that the purpose of the "OS" project (and hence
> associated software) should *not* be to produce a fast analytical
> machine for financial data. Others will do that far better and the
> range and type of analytical requirements is too large for us to
> effectively support. Instead, our goal should be to provide a
> "rock-solid" database (in the broadest sense - not necessarily a
> RDBMS) and related tooling (to get data in and some examples of how
> to get data out and displayed and analyzed - but the latter tooling
> will likely to be fairly limited).

I think we have to think about what the OLAP cube is about. The two of
us have different perspectives on the project. You find the database to
be the most important part, I find it to be the OLAP cube. I might be
placing my focus on the wrong bit and therefore I'd like to hear your
opinion on the cube.

> I'm not sure I understand the distinction completely but the key
> thing for me is what we do and focus on.

Let's first agree on what services OpenSpending provides. Then I can
come back to the distinction (it depends on what we're doing).

> A key point for me is that much of the presentation gets done by
> others (just like OpenStreetMap). I also think it is important to
> see OpenSpending as a project with some software to support it than
> as primarily a piece of software. I say this because it means this
> isn't just a classic software product.

Yes, but if we focus only on the database part and not on the analytical
processing, people will have to fill that void somehow. That might
happen and let's say it does. What's the point then of having a central
database?

Let's say that I want to put up a presentation for Icelandic spending
data. I have the data on the Icelandic government's CKAN instance. How
do I get from there to a visualisation site? If I understand you
correctly ou're suggesting that I upload the data to OpenSpending and
then use some other analytical processing software to get the data from
OpenSpending and from the results of that I can create my satellite site
(or write my news article).

Why couldn't the analytical processing software not be a small script I
run on my machine to go through the data and get the information I want
and why can't it just run on the data that's on the CKAN instance
directly? I don't see how OpenSpending as a consolidated database helps
me in any way. I don't really care about data from other countries
unless I want to compare the countries (but I can't be sure that the
data is there or that it's structured in a way I can use it for comparison).

If we're working with standardisation of the spending data, marking it
correctly etc. I see the benefit of OpenSpending. The analytical
processing software could then use standard methods to get data and we'd
be able to compare countries, cities, etc. That's an entirely different
project focus and it can be hard to add information (you can't really go
out with a GPS receiver to get more data about a specific transaction -
in some places you might be able to do it with FOI acts but that's not
certain).

So I don't see the point of focusing on a central database with
differently structured data (in OpenSpending the only two necessary
dimensions are time (which is often inaccurate) and amount. I see the
central database as an extra step that gives me no real value. The real
value I see is helping people work with their data. I might be wrong.
There might be others who find this to be really valuable (and please
correct me if I'm wrong).

I'm not stubborn, just lost in the discussion ;-)

/Tryggvi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20130503/3934ff6c/attachment.html>