[openspending-dev] Experiment: flat-file aggregator "API"

Wed Apr 29 16:29:20 UTC 2015

Hey,

On Wed, Apr 29, 2015 at 4:18 PM, Rufus Pollock <rufus.pollock at okfn.org>
wrote:

> On 29 April 2015 at 14:22, Friedrich Lindenberg <friedrich at pudo.org>
> wrote:
>
>> Hey all,
>>
>> I just wanted to send out a quick note to people on this list about a
>> little hack I did to experiment with flat-file aggregates. The idea is to
>> move OffenerHaushalt off the OpenSpending API, as the platform is currently
>> unmaintained.
>>
>
> Re unmaintained. That isn't what has been said. The service and site is
> maintained and is monitored by sysadmin. As set out in the Roadmap we are
> not prioritising fixes to the loading system as we want to move to the new,
> agreed, setup.
>

OK, We probably mean the same thing.

> In any case, delighted you are looking at flat-file stuff and could this
> be contributed as part of the Roadmap for OpenSpending "v2" platform.
>
>
>> There's a prototype for this idea that I hacked up under
>> https://github.com/mapthemoney/cubepress - it would basically load a
>> given CSV or Excel file into an (in-memory) database and then permute
>> through all possible combinations of the API endpoint which might be
>> accessed by a given set of visualisations, storing the result to name-coded
>> JSON files.
>>
>
> Sounds interesting. Again, would it be worth seeing if this can align or
> be part of the aggregation system for OS v2. Early stage outline at:
>
> http://labs.openspending.org/osep/osep-07.html
>

Yep, although this document leaves me with plenty more questions than
answers: is there a DB in there somewhere? If it's a DB  Is the relation
between FTS and OLAP stronger in any way than just both providing
"analytics"?

> This is based on a schema file, which includes an ultra-lightweight
>> version of an OpenSpending mapping and model:
>>
>> https://github.com/mapthemoney/cubepress/blob/master/test_data/awards.yaml
>>
>
> I assume you've seen work on
> http://labs.openspending.org/osep/osep-04.html including the recent pull
> request.
>

Not sure which pull request you are referring to, please share a link?

>From what I can tell, the data package format does very little in the way
of making possible what cubepress is trying to do. Specifically, it would
give me CSV column data types. It's a lot of overhead to have this full
metadata spec just to get info on types.

The things not included are the actual data model (ie. the cube spec rather
than the column spec), and the query hierarchies and filters needed to
compute permutations.

> This YAML model would be complimentary to the visualisation specs that
>> I've been using to manage the OffenerHaushalt datasets until now:
>>
>> https://github.com/okfde/offenerhaushalt.de/blob/master/sites
>>
>> The success of this approach is probably going to be very varied: some
>> datasets would get by with only a few thousands of permutations, but the
>> larger ones (like the German federal budget) will explode and yield
>> millions or billions of permutations. This will probably fail with issues
>> as basic as file system inodes.
>>
>
> Interesting to see how this goes then.
>
>
>> So, in all, I'm not sure it's a good idea, a proper API still seems like
>> the way to go (so I've also spent some time making a light-weight version
>> of that, but more on that later).
>>
>> The tool isn't using cubes at the moment, mostly because I was on a train
>> while writing it and couldn't download any dependencies. Which is somewhat
>> nice, since it doesn't really have dependencies beyond messytables and
>> sqlalchemy.
>>
>> Would love to hear what other operators of OpenSpending satellite sites
>> think :)
>>
>
> One blunt question: are you basically intent on forking here are you doing
> this as part of a way to continue to contribute to OpenSpending "v2"?
>

I wish I could give you a blunt answer to that one.

I'm not interested in OpenSpending v2, it strikes me as a) bureaucratic, b)
unclearly governed and c) very likely to yield a data catalogue of some
sort, rather than an end-user product. Out of these, (c) is by far my
biggest concern: calling a data catalogue an ecosystem doesn't
fundamentally change what it is.

At the same time, I'm really not interested in forking the OpenSpending
community, either. I neither have the resource nor any institutional
backing to do so, and even if I did, it would be a weird move.

But. I am interested in having a working API for spending data, a simple
way to get stuff in there and some cool query tools (especially a flexible
pivot table viewer). Those could be the ingredients to building out better
interfaces to OffenerHaushalt, but also to some of the EU datasets which I
maintain and would like to see more accessible.

If I saw this coming out of OS v2 I would contribute. But right now I think
you're taking the most convoluted way imaginable to go there, all in the
name of creating an ecosystem rather than a problem solution. While I hope
that you actually will meet user needs in the end, it's a bit like playing
darts when you're trying to draw a picture.

That's what I'm hacking on with spendb, as a late-night entertainment. At
this point I've cut out a lot of old code (including most of the UI and
FTS) and it's really clean and lean. I'm re-building the UI in Angular, in
a way that allows users to change the cube model after data is loaded -
which I hope will allow for quick iteration towards the best model.

I might deploy it for OffenerHaushalt at some point, but would prefer to
frame this as something other than a direct competitor to OS. And
obviously, if there are useful bits coming out the OpenSpending v2 effort,
then I'm glad to adopt them.

Currently, I have my eyes on goodtables, which looks very useful.

Best,

- Friedrich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20150429/f6c3b3eb/attachment-0002.html>