[wdmmg-discuss] CRA 2010: description and questions

Wed Aug 11 21:34:25 UTC 2010

Hi all,

I've just started some work on the WDMMG data store, and Lisa and I
have been looking at the data for CRA 2010, preparatory to loading it
into the store.

Here's what we've discovered, and some questions for discussion
(Alistair, Will?):

**The data**

CRA 2010 consists of *two* spreadsheets, both in the CKAN package at
http://www.ckan.net/package/ukgov-finances-cra

As well as now being in two spreadsheets, it has also become slightly
less granular.

The two tables both show total spending, but classify it differently:
by region and by sub-function. Table 9 classifies spending items by
country, 9 regional areas, and COFOG 1 (e.g. England, East Midlands,
Social protection). Table 10 classifies its items by country, COFOG 1
and COFOG 2 (e.g. England, Social protection, Old age).

Both have just over 20,000 rows and show data from 2004-5 onwards. The
Treasury claims the tables are consistent. One more difference between
the two: Table 9 has projected spending for 2010-11, but Table 10 does
not.

**Differences from CRA 2009**

Last year every item was classified by both region and sub-function,
so the data seems to have become less granular overall. Basically, you
can now classify it either by region, or by sub-function, but not
both.

A few minor differences: we have gained Treasury classifications of
spending, which seem to be analogous to COFOG but not identical. The
'CG or LG' column (central or local government) has gained a third
option and is now 'CG, LG or PC' (public corporation - like the Met
Office & World Service).

Lisa says that the 'unknown' fields that caused problems last time are
less problematic this time - I don't know much about this, but it
sounds like good news.

**Questions**

1. Do we load this as one slice or two, given the two ways of
classifying data? One slice seems feasible, but messy (I guess you
just have columns for both region and cofog2, and always leave one of
them null, and have a lot of potentially duplicate rows and fairly
complex queries). Advice appreciated.

2. We now have actual 2009-10 spending to compare with last year's
projected spending (though unfortunately less granular). I'm thinking
of adding a projected/actual key to the data store to deal with this,
it seems to be a common issue with spending data, unless anyone
objects. Also, do we want to do anything with this comparison?

3. On a related note, should I load in the data from past years in CRA
2010? or do we assume that this would just duplicate CRA 2009?

4. Finally, I'll add the Treasury classifications as a new key, unless
anyone objects.

best wishes
Anna