[wdmmg-discuss] COINS FOI requests (fwd)

Tue Apr 13 12:54:43 UTC 2010

List, meet Donovan. He has volunteered the following advice on the 
aggregator. (I didn't know about Pentaho).

 	Alistair

---------- Forwarded message ----------
On Tue, 13 Apr 2010, Francis Irving wrote:

> I've copied this to Alistair, as he ought to know about pentaho if he
> doesn't already!
> 
> Donny - Alistair is working on the code behind the WDMMG store.
> 
> Alistair - Donny is a chap Julian found in Liverpool who used to work
> in commodities trading, and is interested in helping WDMMG.
> 
> Donny, make sure you are on the WDMMG discussion list :)
> 
> Francis
> 
> On Tue, Apr 13, 2010 at 11:11:22AM +0100, Donovan Hide wrote:
>> Hi Lisa,
>> 
>> thanks for the reply. I realise that my questions might have been a bit
>> vague!! The second question was really trying to get at how to process the
>> data in whatever form it might come across in. I had seen your (very
>> impressive!) visualisation and think it's great. I've done some Flex stuff
>> recently and can appreciate how much work it must have been to get
>> everything laid out nicely.
>> 
>> The real nub of my point was to highlight that the data is multi-dimensional
>> in nature. What this means is that various different hierarchichies can be
>> combined in arbitrary ways to slice and dice the cube of data to answer
>> questions that could be expressed in English language. An example might be:
>> 
>> Show me all spending on schools in Merseyside between 2006 and 2008 compared
>> with that in Hackney for the same time period.
>> 
>> or the more complex:
>> 
>> What is the rate of increase in nuclear decommissioning expenditure for the
>> last 5 years compared with the year to date.
>> 
>> This is what an OLAP system provides a basis for:
>> 
>> http://en.wikipedia.org/wiki/Online_analytical_processing
>> 
>> and is why the Treasury would have chosen a commercial product like
>> Terasolve. The technical problems that these systems solve include:
>> 
>> 1. Speed of queries for abitrary combinations of dimensions by
>> pre-aggregating data
>> 2. A query language that can express the combinations of dimension and
>> calculations on the items that can make it useful (MDX)
>> 3. The ability to change the layout of dimensions as, for example,
>> government departments merge with or subsume others
>> 
>> In the past I've worked with both commercial OLAP servers and hand-coded, on
>> the fly, aggregators. There are definite advantages to both ways. Looking at
>> the store prototype:
>> 
>> http://store.wheredoesmymoneygo.org
>> 
>> I can see a nice domain-driven model sitting on a key-value store (CouchDB?)
>> which could be exposed REST-fully, which is great. The technical problem
>> that I could foresee though is that the most common query you might receive
>> is what is the sum of all 23 million transactions for the top-most level of
>> each of the dimensions. Without a pre-aggregation, the simplest query could
>> potentially be the most resource-intensive one. Hence OLAP was invented!!
>> 
>> The other issue that could arise is how to map a dump of their OLAP database
>> to a different schema, and what to do for subsequent dumps if the layout of
>> dimensions has changed. Dimensions are generally fluid and dynamic by
>> nature.
>> 
>> It is for these reasons that I would suggest an evaluation of either a
>> commercial or an open source OLAP server such as:
>>
>>  http://mondrian.pentaho.org/
>> 
>> would be useful. It would be perfectly possible to wrap a number of MDX
>> queries up in a REST-ful API with documented responses, but at the same time
>> maintain the true structure of the data so that more advanced queries would
>> still be possible.
>> 
>> Hope that helps, tell me to shut up if I'm being annoying!!
>> 
>> Donny.
>> 
>> 
>> On 13 April 2010 08:41, <lisa.evans at okfn.org> wrote:
>> 
>>> Hi Donovan,
>>> 
>>> Thank you for this good invetigation work, it would be great if you would
>>> like to continue to work on this. I've added an answer to one of your
>>> questions inline.
>>> 
>>> On Mon, 12 Apr 2010, Donovan Hide wrote:
>>> 
>>> [snip]
>>>
>>>  Two interesting questions come to mind:
>>>> 
>>>> 1. How to get hold of some of the data while not raising any of the
>>>> Treasury's concerns.
>>>> 2. What to do with the data, if we got hold of it!
>>>> 
>>> 
>>> When we have the data we have a ready built data store for it, which is
>>> part of the 'where does my money go?' project. We will aslo be able to
>>> visulise the spending as part of this project.
>>> 
>>> You can see our prototype visulisation, which uses a report of COINS data,
>>> here:
>>> 
>>> http://www.wheredoesmymoneygo.org/prototype/
>>> 
>>> [snip]
>>> 
>>> Thanks again,
>>> 
>>> Lisa