[openspending-dev] Experiment: flat-file aggregator "API"

Wed Apr 29 20:03:47 UTC 2015

Hi Friedrich,

Cubepress looks like a really good start on aggregate data in flat files. Nice work. You know that this is part of the desired implementation for OS v2, and it at least looks like it could be used as part of the “spike solution” I proposed yesterday as a way to get OS v2 rolling (https://discuss.okfn.org/t/2015-near-term-technical-roadmap-for-openspending/264/2).

The OSEP 4 pull request Rufus refers to is one I’ve been working on here: https://github.com/openspending/osep/pull/13 <https://github.com/openspending/osep/pull/13> (probably we have some work to do, and any comments would be welcome).

Best,

Paul

> On 29 Apr 2015, at 19:29, Friedrich Lindenberg <friedrich at pudo.org> wrote:
> 
> Hey,
> 
> On Wed, Apr 29, 2015 at 4:18 PM, Rufus Pollock <rufus.pollock at okfn.org <mailto:rufus.pollock at okfn.org>> wrote:
> On 29 April 2015 at 14:22, Friedrich Lindenberg <friedrich at pudo.org <mailto:friedrich at pudo.org>> wrote:
> Hey all, 
> 
> I just wanted to send out a quick note to people on this list about a little hack I did to experiment with flat-file aggregates. The idea is to move OffenerHaushalt off the OpenSpending API, as the platform is currently unmaintained.
> 
> Re unmaintained. That isn't what has been said. The service and site is maintained and is monitored by sysadmin. As set out in the Roadmap we are not prioritising fixes to the loading system as we want to move to the new, agreed, setup.
> 
> OK, We probably mean the same thing. 
>  
> In any case, delighted you are looking at flat-file stuff and could this be contributed as part of the Roadmap for OpenSpending "v2" platform.
>  
> There's a prototype for this idea that I hacked up under https://github.com/mapthemoney/cubepress <https://github.com/mapthemoney/cubepress> - it would basically load a given CSV or Excel file into an (in-memory) database and then permute through all possible combinations of the API endpoint which might be accessed by a given set of visualisations, storing the result to name-coded JSON files.
> 
> Sounds interesting. Again, would it be worth seeing if this can align or be part of the aggregation system for OS v2. Early stage outline at:
> 
> http://labs.openspending.org/osep/osep-07.html <http://labs.openspending.org/osep/osep-07.html>
> 
> Yep, although this document leaves me with plenty more questions than answers: is there a DB in there somewhere? If it's a DB  Is the relation between FTS and OLAP stronger in any way than just both providing "analytics"? 
>  
> This is based on a schema file, which includes an ultra-lightweight version of an OpenSpending mapping and model: 
> 
> https://github.com/mapthemoney/cubepress/blob/master/test_data/awards.yaml <https://github.com/mapthemoney/cubepress/blob/master/test_data/awards.yaml>
> 
> I assume you've seen work on http://labs.openspending.org/osep/osep-04.html <http://labs.openspending.org/osep/osep-04.html> including the recent pull request.
> 
> Not sure which pull request you are referring to, please share a link?
> 
> From what I can tell, the data package format does very little in the way of making possible what cubepress is trying to do. Specifically, it would give me CSV column data types. It's a lot of overhead to have this full metadata spec just to get info on types.
> 
> The things not included are the actual data model (ie. the cube spec rather than the column spec), and the query hierarchies and filters needed to compute permutations. 
>  
> This YAML model would be complimentary to the visualisation specs that I've been using to manage the OffenerHaushalt datasets until now: 
> 
> https://github.com/okfde/offenerhaushalt.de/blob/master/sites <https://github.com/okfde/offenerhaushalt.de/blob/master/sites>
> 
> The success of this approach is probably going to be very varied: some datasets would get by with only a few thousands of permutations, but the larger ones (like the German federal budget) will explode and yield millions or billions of permutations. This will probably fail with issues as basic as file system inodes. 
> 
> Interesting to see how this goes then.
>  
> So, in all, I'm not sure it's a good idea, a proper API still seems like the way to go (so I've also spent some time making a light-weight version of that, but more on that later). 
> 
> The tool isn't using cubes at the moment, mostly because I was on a train while writing it and couldn't download any dependencies. Which is somewhat nice, since it doesn't really have dependencies beyond messytables and sqlalchemy. 
> 
> Would love to hear what other operators of OpenSpending satellite sites think :) 
> 
> One blunt question: are you basically intent on forking here are you doing this as part of a way to continue to contribute to OpenSpending "v2"?
> 
> I wish I could give you a blunt answer to that one.
> 
> I'm not interested in OpenSpending v2, it strikes me as a) bureaucratic, b) unclearly governed and c) very likely to yield a data catalogue of some sort, rather than an end-user product. Out of these, (c) is by far my biggest concern: calling a data catalogue an ecosystem doesn't fundamentally change what it is.
> 
> At the same time, I'm really not interested in forking the OpenSpending community, either. I neither have the resource nor any institutional backing to do so, and even if I did, it would be a weird move.
> 
> But. I am interested in having a working API for spending data, a simple way to get stuff in there and some cool query tools (especially a flexible pivot table viewer). Those could be the ingredients to building out better interfaces to OffenerHaushalt, but also to some of the EU datasets which I maintain and would like to see more accessible.
> 
> If I saw this coming out of OS v2 I would contribute. But right now I think you're taking the most convoluted way imaginable to go there, all in the name of creating an ecosystem rather than a problem solution. While I hope that you actually will meet user needs in the end, it's a bit like playing darts when you're trying to draw a picture.
> 
> That's what I'm hacking on with spendb, as a late-night entertainment. At this point I've cut out a lot of old code (including most of the UI and FTS) and it's really clean and lean. I'm re-building the UI in Angular, in a way that allows users to change the cube model after data is loaded - which I hope will allow for quick iteration towards the best model.
> 
> I might deploy it for OffenerHaushalt at some point, but would prefer to frame this as something other than a direct competitor to OS. And obviously, if there are useful bits coming out the OpenSpending v2 effort, then I'm glad to adopt them.
> 
> Currently, I have my eyes on goodtables, which looks very useful.
> 
> Best, 
> 
> - Friedrich 
> 
> _______________________________________________
> openspending-dev mailing list
> openspending-dev at lists.okfn.org <mailto:openspending-dev at lists.okfn.org>
> https://lists.okfn.org/mailman/listinfo/openspending-dev <https://lists.okfn.org/mailman/listinfo/openspending-dev>
> Unsubscribe: https://lists.okfn.org/mailman/options/openspending-dev <https://lists.okfn.org/mailman/options/openspending-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20150429/1b45cfba/attachment-0002.html>