[wdmmg-discuss] CRA data analysis
dave at iconomical.com
Fri Jan 15 13:59:10 UTC 2010
Attention deficit notice: this email is quite detailed, and can probably be skipped if you're not interested in the tedious nitty-gritty aspects of the data.
The prototype was developed using the PESA data, which contains aggregations of UK public spending. In general, spending is classified using a scheme called COFOG: this defines a number of top-level functions, and each function is further broken down into sub-functions. For instance, pensions are part of the Social protection function, and Old age subfunction. More info about COFOG here: http://unstats.un.org/unsd/cr/registry/regcst.asp?Cl=4
Money is spent either by central government, or by local authorities. Central government spending is performed by one a number of government departments, like the Dept of Transport, the Home Office, etc. Local authority spending gets grouped by area, depending on where the local authority is: England is broken down by region; Scotland, Wales and NI are pretty obvious. PESA doesn't differentiate between things like county, unitary authority, regions, etc: they're all grouped together as local authority spending.
Spending can also be defined as identifiable or non-identifiable. Identifiable spending means that the money is spent for the benefit of a particular region, eg, pension spending for people in Wales. Almost everything is identifiable spending, even spending that goes outside the UK.
By contrast, non-identifiable spending is spending that's thought to benefit the UK as a whole. For instance, much of defence spending, the civil list, national institutions, and the national debt fall into this category. (As does spending on the London Olympics, in spite of the word 'London'.)
The CRA (county and regional analysis) data is a very large (25K rows) spreadsheet containing detailed spending breakdowns by department and program, so it offers much more granularity than PESA. What CRA allows us to do is take a look at a COFOG function or subfunction, and find out how the money is allocated between different spending programs, or between different regions. In theory, CRA should contain most of the raw information that was used to generate PESA.
We've been spending time this week looking at the CRA data, mainly to understand what's in it, and what we can do with it. Here's the current status:
The biggest item is that the top-level aggregate numbers derived from CRA (ie, the top-level function totals) match the PESA numbers, give or take. This shouldn't be a surprise, since PESA is supposed to have been generated from CRA in the first place.
We can work with departments now, which we couldn't do before. This means we can look at central government spending across departments, over a number of years. We can also link up departments with their spending in terms of COFOG. For instance, we can look at a particular department and see how its spending breaks down by function, or do the reverse, look at a function and see which departments do their spending there.
We have proper drill-down by programs. This means that we can look at spending programs in terms of how much money was spent on them, to find the biggest ones. We can do this for the uk as a whole, for countries, and for regions. For instance, we can look at Environment spending in the North West, and see that it's almost all due to the Nuclear Decommissioning Authority.
We have the ability to do searches through the program items:
- find all the olympics spending;
- find all the pensions spending;
- find all the museum spending.
Addition of the program items is a big deal.
The most fundamental issue in CRA is that none of the English local authority spending is broken down further than the top-level COFOG function. That means that for a region like, say, London, we can get local authority spending on social protection, but can't find out how that was divided between housing, sickness/disability, etc. Two consequences of that:
- we can't show this breakdown. (Ok, the prototype had the same problem because it wasn't in PESA either);
- we can't generate the UK-wide COFOG subfunction breakdown just from CRA (but we can get it from PESA).
However, in order to create the UK-wide subfunction breakdown in the first place, the people who compiled PESA must have the local authority spending broken down by subfunction, somehow. There are also additional PESA tables that can't be generated from CRA.
The lack of drill-down for LA spending into English regions is a really big problem. We can get subfunction and program drill-down for the other nations, but not for England. This is painful since LAs are responsible for the majority of the spending of items like education.
You can find an interesting spreadsheet on the CRA website called cra_form.xls, which appears to be the form sent to the Treasury by local authorities, and is supposed to contain their submissions to the PESA/CRA process. One of the sheets has contact information, so this could be a route to getting more detailed info.
Spending programs are owned by individual spending departments. (There are some exemptions, but they're just accounting tweaks.) This is, there's a one to one relationship between programs and departments.
Some spending programs are linked to more than one COFOG function and/or subfunction. That is, a one to many relationship between programs and COFOG functions/subfunctions.
Programs can have multiple rows in the database. Most of this is because a program's spending is split across multiple regions, capital/current, id/non-id, etc, but there are a number of cases where rows are the same, except for the actual numbers.
The Department codes are unreliable, since there are a number of mismatches. I haven't made any investigation as to how easy it is to reconcile them, and I'm taking the position that the department name is the important field.
CRA broadly conforms to COFOG, but there many thousands of rows which don't have the correct function/subfunction definitions. In almost all cases, it's possible to re-map them correctly. Except for health spending, and it's not feasible for us to fix that.
I have a Python script which takes the original CRA data, and outputs a new version with fixed COFOG functions and subfunctions. This is particularly useful if anyone wants to do things like put CRA into an SQL database, or run pivot table analyses. Let me know if you want the script.
The cra_data.xls file contains a reference to another interesting data source: http://www.statistics.gov.uk/geography/downloads/presadminannexb.pdf
This is a detailed description of how the geographical regions in PESA/CRA break down, in terms of smaller regional authorities. It also contains information about population!
We might get some milage by looking at other government reporting. For example, the Scottish executive produce their own accounts. (Here's an example: http://www.scotland.gov.uk/Publications/2009/06/18101733/0) And the treasury website contains a useful page pointing to a number of govt departments' reporting. (http://www.hm-treasury.gov.uk/psp_index.htm) I doubt it's feasible to do much with this kind of data in the short term, but perhaps it would be a useful exercise to at least identify what's out there.
We're going to continue with our analysis, but also look at visualization models, and think about the requirements and ideas that we've been discussing. In the meantime, I think there should be an action item to chase up the Treasury and see if we can get more detailed data, specifically to fill the hole in English LA spending. Who wants to take this up?
More information about the openspending