[wdmmg-discuss] COINS FOI requests (fwd)
Julian Todd
julian at goatchurch.org.uk
Tue Apr 13 14:23:37 UTC 2010
Just to be clear, there are many uses for this data, beyond just OLAP.
Just because they have gathered it for OLAP, doesn't mean we
necessarily need to use it for their kind of analysis.
I know it's the boring case, but the MPs' expenses thing proved that
it's the details that matter. Having put all their aggregate expenses
into an OLAP system to analyse which areas of the country were more
expensive to represent totally misses the story of widespread
systematic corruption at ever level.
That said, if I did have the data, I'd like to derive from it the
public sector spending pattern of hurriedly blowing the remainder of
one's budget in April in order to prevent a budget cut the following
year. This is a tactic that works, because the people setting the
budget are looking at the aggregate value, and not inspecting what
stuff the money was spent on.
Julian.
On Tue, Apr 13, 2010 at 1:54 PM, Alistair Turnbull
<apt1002 at goose.minworks.co.uk> wrote:
> List, meet Donovan. He has volunteered the following advice on the
> aggregator. (I didn't know about Pentaho).
>
> Alistair
>
> ---------- Forwarded message ----------
> On Tue, 13 Apr 2010, Francis Irving wrote:
>
>> I've copied this to Alistair, as he ought to know about pentaho if he
>> doesn't already!
>>
>> Donny - Alistair is working on the code behind the WDMMG store.
>>
>> Alistair - Donny is a chap Julian found in Liverpool who used to work
>> in commodities trading, and is interested in helping WDMMG.
>>
>> Donny, make sure you are on the WDMMG discussion list :)
>>
>> Francis
>>
>> On Tue, Apr 13, 2010 at 11:11:22AM +0100, Donovan Hide wrote:
>>>
>>> Hi Lisa,
>>>
>>> thanks for the reply. I realise that my questions might have been a bit
>>> vague!! The second question was really trying to get at how to process
>>> the
>>> data in whatever form it might come across in. I had seen your (very
>>> impressive!) visualisation and think it's great. I've done some Flex
>>> stuff
>>> recently and can appreciate how much work it must have been to get
>>> everything laid out nicely.
>>>
>>> The real nub of my point was to highlight that the data is
>>> multi-dimensional
>>> in nature. What this means is that various different hierarchichies can
>>> be
>>> combined in arbitrary ways to slice and dice the cube of data to answer
>>> questions that could be expressed in English language. An example might
>>> be:
>>>
>>> Show me all spending on schools in Merseyside between 2006 and 2008
>>> compared
>>> with that in Hackney for the same time period.
>>>
>>> or the more complex:
>>>
>>> What is the rate of increase in nuclear decommissioning expenditure for
>>> the
>>> last 5 years compared with the year to date.
>>>
>>> This is what an OLAP system provides a basis for:
>>>
>>> http://en.wikipedia.org/wiki/Online_analytical_processing
>>>
>>> and is why the Treasury would have chosen a commercial product like
>>> Terasolve. The technical problems that these systems solve include:
>>>
>>> 1. Speed of queries for abitrary combinations of dimensions by
>>> pre-aggregating data
>>> 2. A query language that can express the combinations of dimension and
>>> calculations on the items that can make it useful (MDX)
>>> 3. The ability to change the layout of dimensions as, for example,
>>> government departments merge with or subsume others
>>>
>>> In the past I've worked with both commercial OLAP servers and hand-coded,
>>> on
>>> the fly, aggregators. There are definite advantages to both ways. Looking
>>> at
>>> the store prototype:
>>>
>>> http://store.wheredoesmymoneygo.org
>>>
>>> I can see a nice domain-driven model sitting on a key-value store
>>> (CouchDB?)
>>> which could be exposed REST-fully, which is great. The technical problem
>>> that I could foresee though is that the most common query you might
>>> receive
>>> is what is the sum of all 23 million transactions for the top-most level
>>> of
>>> each of the dimensions. Without a pre-aggregation, the simplest query
>>> could
>>> potentially be the most resource-intensive one. Hence OLAP was invented!!
>>>
>>> The other issue that could arise is how to map a dump of their OLAP
>>> database
>>> to a different schema, and what to do for subsequent dumps if the layout
>>> of
>>> dimensions has changed. Dimensions are generally fluid and dynamic by
>>> nature.
>>>
>>> It is for these reasons that I would suggest an evaluation of either a
>>> commercial or an open source OLAP server such as:
>>>
>>> http://mondrian.pentaho.org/
>>>
>>> would be useful. It would be perfectly possible to wrap a number of MDX
>>> queries up in a REST-ful API with documented responses, but at the same
>>> time
>>> maintain the true structure of the data so that more advanced queries
>>> would
>>> still be possible.
>>>
>>> Hope that helps, tell me to shut up if I'm being annoying!!
>>>
>>> Donny.
>>>
>>>
>>> On 13 April 2010 08:41, <lisa.evans at okfn.org> wrote:
>>>
>>>> Hi Donovan,
>>>>
>>>> Thank you for this good invetigation work, it would be great if you
>>>> would
>>>> like to continue to work on this. I've added an answer to one of your
>>>> questions inline.
>>>>
>>>> On Mon, 12 Apr 2010, Donovan Hide wrote:
>>>>
>>>> [snip]
>>>>
>>>> Two interesting questions come to mind:
>>>>>
>>>>> 1. How to get hold of some of the data while not raising any of the
>>>>> Treasury's concerns.
>>>>> 2. What to do with the data, if we got hold of it!
>>>>>
>>>>
>>>> When we have the data we have a ready built data store for it, which is
>>>> part of the 'where does my money go?' project. We will aslo be able to
>>>> visulise the spending as part of this project.
>>>>
>>>> You can see our prototype visulisation, which uses a report of COINS
>>>> data,
>>>> here:
>>>>
>>>> http://www.wheredoesmymoneygo.org/prototype/
>>>>
>>>> [snip]
>>>>
>>>> Thanks again,
>>>>
>>>> Lisa
>
> _______________________________________________
> wdmmg-discuss mailing list
> wdmmg-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/wdmmg-discuss
>
More information about the openspending
mailing list