[wdmmg-discuss] COINS FOI requests (fwd)

Tue Apr 13 15:49:08 UTC 2010

Donny is right that aggregation and filtering is a large part of what our 
store will spend its time doing. I'm working on that code right now, as it 
happens.

We're using a custom-written aggregator at the moment, and honestly we 
probably aren't going to switch to anything else this month (at 75K 
transactions, the CRA isn't big enough to cause problems). However, in the 
longer term an alternative query engine optimised for this sort of work 
might well become interesting.

 	Alistair

On Tue, 13 Apr 2010, Donovan Hide wrote:

> Hi Julian!
> 
> I agree that OLAP tools can be used for examining the "macro" or top-level picture through the use of aggregations.
> However, they are also excellent for examining the data at it's finest granularity. It would be a simple matter for
> instance to write a query like:
> 
> show me all transactions which were of exactly the same value from the same supplier
> 
> or
> 
> show me the region which has the most transactions with supplier A
> 
> MDX is basically a better query language than SQL for any  type of question that involves varying units of time, and
> comparing like with like from different segments of the data. All the transactions would be just as accessible as if they
> were in a SQL database or a key-value store.
> 
> Preserving the dimensions allows for the hallowed "drilldown" functionality, the using of which often helps you find the
> trends that you then write the queries to back up. If you haven't used a Business Intelligence tool where you drag the
> dimensions around at will in a GUI, then it can be quite an eye opener the first time! The custom report editor of Google
> Analytics is probably the most widely available example of it in action. The pivot table in Excel is another, cruder
> example.
> 
> It's all horses for courses, about which tools are best really. It's often more fun to write it all from scratch and use
> something new like Redis that's blazing fast and try and get some features implemented in the bleeding edge of it's
> development (like aggregations and grouping!).
> 
> Cheers,
> Donny.
> 
> On 13 April 2010 15:23, Julian Todd <julian at goatchurch.org.uk> wrote:
>       Just to be clear, there are many uses for this data, beyond just OLAP.
>        Just because they have gathered it for OLAP, doesn't mean we
>       necessarily need to use it for their kind of analysis.
>
>       I know it's the boring case, but the MPs' expenses thing proved that
>       it's the details that matter.  Having put all their aggregate expenses
>       into an OLAP system to analyse which areas of the country were more
>       expensive to represent totally misses the story of widespread
>       systematic corruption at ever level.
>
>       That said, if I did have the data, I'd like to derive from it the
>       public sector spending pattern of hurriedly blowing the remainder of
>       one's budget in April in order to prevent a budget cut the following
>       year.  This is a tactic that works, because the people setting the
>       budget are looking at the aggregate value, and not inspecting what
>       stuff the money was spent on.
>
>       Julian.
> 
> 
> 
>
>       On Tue, Apr 13, 2010 at 1:54 PM, Alistair Turnbull
>       <apt1002 at goose.minworks.co.uk> wrote:
> > List, meet Donovan. He has volunteered the following advice on the
> > aggregator. (I didn't know about Pentaho).
> >
> >        Alistair
> >
> > ---------- Forwarded message ----------
> > On Tue, 13 Apr 2010, Francis Irving wrote:
> >
> >> I've copied this to Alistair, as he ought to know about pentaho if he
> >> doesn't already!
> >>
> >> Donny - Alistair is working on the code behind the WDMMG store.
> >>
> >> Alistair - Donny is a chap Julian found in Liverpool who used to work
> >> in commodities trading, and is interested in helping WDMMG.
> >>
> >> Donny, make sure you are on the WDMMG discussion list :)
> >>
> >> Francis
> >>
> >> On Tue, Apr 13, 2010 at 11:11:22AM +0100, Donovan Hide wrote:
> >>>
> >>> Hi Lisa,
> >>>
> >>> thanks for the reply. I realise that my questions might have been a bit
> >>> vague!! The second question was really trying to get at how to process
> >>> the
> >>> data in whatever form it might come across in. I had seen your (very
> >>> impressive!) visualisation and think it's great. I've done some Flex
> >>> stuff
> >>> recently and can appreciate how much work it must have been to get
> >>> everything laid out nicely.
> >>>
> >>> The real nub of my point was to highlight that the data is
> >>> multi-dimensional
> >>> in nature. What this means is that various different hierarchichies can
> >>> be
> >>> combined in arbitrary ways to slice and dice the cube of data to answer
> >>> questions that could be expressed in English language. An example might
> >>> be:
> >>>
> >>> Show me all spending on schools in Merseyside between 2006 and 2008
> >>> compared
> >>> with that in Hackney for the same time period.
> >>>
> >>> or the more complex:
> >>>
> >>> What is the rate of increase in nuclear decommissioning expenditure for
> >>> the
> >>> last 5 years compared with the year to date.
> >>>
> >>> This is what an OLAP system provides a basis for:
> >>>
> >>> http://en.wikipedia.org/wiki/Online_analytical_processing
> >>>
> >>> and is why the Treasury would have chosen a commercial product like
> >>> Terasolve. The technical problems that these systems solve include:
> >>>
> >>> 1. Speed of queries for abitrary combinations of dimensions by
> >>> pre-aggregating data
> >>> 2. A query language that can express the combinations of dimension and
> >>> calculations on the items that can make it useful (MDX)
> >>> 3. The ability to change the layout of dimensions as, for example,
> >>> government departments merge with or subsume others
> >>>
> >>> In the past I've worked with both commercial OLAP servers and hand-coded,
> >>> on
> >>> the fly, aggregators. There are definite advantages to both ways. Looking
> >>> at
> >>> the store prototype:
> >>>
> >>> http://store.wheredoesmymoneygo.org
> >>>
> >>> I can see a nice domain-driven model sitting on a key-value store
> >>> (CouchDB?)
> >>> which could be exposed REST-fully, which is great. The technical problem
> >>> that I could foresee though is that the most common query you might
> >>> receive
> >>> is what is the sum of all 23 million transactions for the top-most level
> >>> of
> >>> each of the dimensions. Without a pre-aggregation, the simplest query
> >>> could
> >>> potentially be the most resource-intensive one. Hence OLAP was invented!!
> >>>
> >>> The other issue that could arise is how to map a dump of their OLAP
> >>> database
> >>> to a different schema, and what to do for subsequent dumps if the layout
> >>> of
> >>> dimensions has changed. Dimensions are generally fluid and dynamic by
> >>> nature.
> >>>
> >>> It is for these reasons that I would suggest an evaluation of either a
> >>> commercial or an open source OLAP server such as:
> >>>
> >>>  http://mondrian.pentaho.org/
> >>>
> >>> would be useful. It would be perfectly possible to wrap a number of MDX
> >>> queries up in a REST-ful API with documented responses, but at the same
> >>> time
> >>> maintain the true structure of the data so that more advanced queries
> >>> would
> >>> still be possible.
> >>>
> >>> Hope that helps, tell me to shut up if I'm being annoying!!
> >>>
> >>> Donny.
> >>>
> >>>
> >>> On 13 April 2010 08:41, <lisa.evans at okfn.org> wrote:
> >>>
> >>>> Hi Donovan,
> >>>>
> >>>> Thank you for this good invetigation work, it would be great if you
> >>>> would
> >>>> like to continue to work on this. I've added an answer to one of your
> >>>> questions inline.
> >>>>
> >>>> On Mon, 12 Apr 2010, Donovan Hide wrote:
> >>>>
> >>>> [snip]
> >>>>
> >>>>  Two interesting questions come to mind:
> >>>>>
> >>>>> 1. How to get hold of some of the data while not raising any of the
> >>>>> Treasury's concerns.
> >>>>> 2. What to do with the data, if we got hold of it!
> >>>>>
> >>>>
> >>>> When we have the data we have a ready built data store for it, which is
> >>>> part of the 'where does my money go?' project. We will aslo be able to
> >>>> visulise the spending as part of this project.
> >>>>
> >>>> You can see our prototype visulisation, which uses a report of COINS
> >>>> data,
> >>>> here:
> >>>>
> >>>> http://www.wheredoesmymoneygo.org/prototype/
> >>>>
> >>>> [snip]
> >>>>
> >>>> Thanks again,
> >>>>
> >>>> Lisa
> >
> > _______________________________________________
> > wdmmg-discuss mailing list
> > wdmmg-discuss at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/wdmmg-discuss
> >
> 
> 
> 
>