[wdmmg-discuss] COINS FOI requests (fwd)

Donovan Hide donny at sur.ly
Tue Apr 13 14:43:43 UTC 2010


Hi Julian!

I agree that OLAP tools can be used for examining the "macro" or top-level
picture through the use of aggregations. However, they are also excellent
for examining the data at it's finest granularity. It would be a simple
matter for instance to write a query like:

show me all transactions which were of exactly the same value from the same
supplier

or

show me the region which has the most transactions with supplier A

MDX is basically a better query language than SQL for any  type of question
that involves varying units of time, and comparing like with like from
different segments of the data. All the transactions would be just as
accessible as if they were in a SQL database or a key-value store.

Preserving the dimensions allows for the hallowed "drilldown" functionality,
the using of which often helps you find the trends that you then write the
queries to back up. If you haven't used a Business Intelligence tool where
you drag the dimensions around at will in a GUI, then it can be quite an eye
opener the first time! The custom report editor of Google Analytics is
probably the most widely available example of it in action. The pivot table
in Excel is another, cruder example.

It's all horses for courses, about which tools are best really. It's often
more fun to write it all from scratch and use something new like Redis
that's blazing fast and try and get some features implemented in the
bleeding edge of it's development (like aggregations and grouping!).

Cheers,
Donny.

On 13 April 2010 15:23, Julian Todd <julian at goatchurch.org.uk> wrote:

> Just to be clear, there are many uses for this data, beyond just OLAP.
>  Just because they have gathered it for OLAP, doesn't mean we
> necessarily need to use it for their kind of analysis.
>
> I know it's the boring case, but the MPs' expenses thing proved that
> it's the details that matter.  Having put all their aggregate expenses
> into an OLAP system to analyse which areas of the country were more
> expensive to represent totally misses the story of widespread
> systematic corruption at ever level.
>
> That said, if I did have the data, I'd like to derive from it the
> public sector spending pattern of hurriedly blowing the remainder of
> one's budget in April in order to prevent a budget cut the following
> year.  This is a tactic that works, because the people setting the
> budget are looking at the aggregate value, and not inspecting what
> stuff the money was spent on.
>
> Julian.
>
>
>
>
> On Tue, Apr 13, 2010 at 1:54 PM, Alistair Turnbull
> <apt1002 at goose.minworks.co.uk> wrote:
> > List, meet Donovan. He has volunteered the following advice on the
> > aggregator. (I didn't know about Pentaho).
> >
> >        Alistair
> >
> > ---------- Forwarded message ----------
> > On Tue, 13 Apr 2010, Francis Irving wrote:
> >
> >> I've copied this to Alistair, as he ought to know about pentaho if he
> >> doesn't already!
> >>
> >> Donny - Alistair is working on the code behind the WDMMG store.
> >>
> >> Alistair - Donny is a chap Julian found in Liverpool who used to work
> >> in commodities trading, and is interested in helping WDMMG.
> >>
> >> Donny, make sure you are on the WDMMG discussion list :)
> >>
> >> Francis
> >>
> >> On Tue, Apr 13, 2010 at 11:11:22AM +0100, Donovan Hide wrote:
> >>>
> >>> Hi Lisa,
> >>>
> >>> thanks for the reply. I realise that my questions might have been a bit
> >>> vague!! The second question was really trying to get at how to process
> >>> the
> >>> data in whatever form it might come across in. I had seen your (very
> >>> impressive!) visualisation and think it's great. I've done some Flex
> >>> stuff
> >>> recently and can appreciate how much work it must have been to get
> >>> everything laid out nicely.
> >>>
> >>> The real nub of my point was to highlight that the data is
> >>> multi-dimensional
> >>> in nature. What this means is that various different hierarchichies can
> >>> be
> >>> combined in arbitrary ways to slice and dice the cube of data to answer
> >>> questions that could be expressed in English language. An example might
> >>> be:
> >>>
> >>> Show me all spending on schools in Merseyside between 2006 and 2008
> >>> compared
> >>> with that in Hackney for the same time period.
> >>>
> >>> or the more complex:
> >>>
> >>> What is the rate of increase in nuclear decommissioning expenditure for
> >>> the
> >>> last 5 years compared with the year to date.
> >>>
> >>> This is what an OLAP system provides a basis for:
> >>>
> >>> http://en.wikipedia.org/wiki/Online_analytical_processing
> >>>
> >>> and is why the Treasury would have chosen a commercial product like
> >>> Terasolve. The technical problems that these systems solve include:
> >>>
> >>> 1. Speed of queries for abitrary combinations of dimensions by
> >>> pre-aggregating data
> >>> 2. A query language that can express the combinations of dimension and
> >>> calculations on the items that can make it useful (MDX)
> >>> 3. The ability to change the layout of dimensions as, for example,
> >>> government departments merge with or subsume others
> >>>
> >>> In the past I've worked with both commercial OLAP servers and
> hand-coded,
> >>> on
> >>> the fly, aggregators. There are definite advantages to both ways.
> Looking
> >>> at
> >>> the store prototype:
> >>>
> >>> http://store.wheredoesmymoneygo.org
> >>>
> >>> I can see a nice domain-driven model sitting on a key-value store
> >>> (CouchDB?)
> >>> which could be exposed REST-fully, which is great. The technical
> problem
> >>> that I could foresee though is that the most common query you might
> >>> receive
> >>> is what is the sum of all 23 million transactions for the top-most
> level
> >>> of
> >>> each of the dimensions. Without a pre-aggregation, the simplest query
> >>> could
> >>> potentially be the most resource-intensive one. Hence OLAP was
> invented!!
> >>>
> >>> The other issue that could arise is how to map a dump of their OLAP
> >>> database
> >>> to a different schema, and what to do for subsequent dumps if the
> layout
> >>> of
> >>> dimensions has changed. Dimensions are generally fluid and dynamic by
> >>> nature.
> >>>
> >>> It is for these reasons that I would suggest an evaluation of either a
> >>> commercial or an open source OLAP server such as:
> >>>
> >>>  http://mondrian.pentaho.org/
> >>>
> >>> would be useful. It would be perfectly possible to wrap a number of MDX
> >>> queries up in a REST-ful API with documented responses, but at the same
> >>> time
> >>> maintain the true structure of the data so that more advanced queries
> >>> would
> >>> still be possible.
> >>>
> >>> Hope that helps, tell me to shut up if I'm being annoying!!
> >>>
> >>> Donny.
> >>>
> >>>
> >>> On 13 April 2010 08:41, <lisa.evans at okfn.org> wrote:
> >>>
> >>>> Hi Donovan,
> >>>>
> >>>> Thank you for this good invetigation work, it would be great if you
> >>>> would
> >>>> like to continue to work on this. I've added an answer to one of your
> >>>> questions inline.
> >>>>
> >>>> On Mon, 12 Apr 2010, Donovan Hide wrote:
> >>>>
> >>>> [snip]
> >>>>
> >>>>  Two interesting questions come to mind:
> >>>>>
> >>>>> 1. How to get hold of some of the data while not raising any of the
> >>>>> Treasury's concerns.
> >>>>> 2. What to do with the data, if we got hold of it!
> >>>>>
> >>>>
> >>>> When we have the data we have a ready built data store for it, which
> is
> >>>> part of the 'where does my money go?' project. We will aslo be able to
> >>>> visulise the spending as part of this project.
> >>>>
> >>>> You can see our prototype visulisation, which uses a report of COINS
> >>>> data,
> >>>> here:
> >>>>
> >>>> http://www.wheredoesmymoneygo.org/prototype/
> >>>>
> >>>> [snip]
> >>>>
> >>>> Thanks again,
> >>>>
> >>>> Lisa
> >
> > _______________________________________________
> > wdmmg-discuss mailing list
> > wdmmg-discuss at lists.okfn.org
> > http://lists.okfn.org/mailman/listinfo/wdmmg-discuss
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending/attachments/20100413/604c2d99/attachment.html>


More information about the openspending mailing list