[OpenSpending-discuss] How Spending Stories Spots Errors in Spending Data

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Tue Dec 6 10:12:27 UTC 2011


On Tue, Dec 6, 2011 at 11:01 AM, Alex (Maxious) Sadleir
<maxious at gmail.com> wrote:
> Something I have been considering in preparing Australian data for
> OpenSpending is that it would be very interesting to have a software
> package to rate spending items like an email spam filter detects spam.
> The most interesting spending items to a human are often the ones with
> the most things "wrong" with them on a data level - edited to increase
> the value by double, recorded in a database after the money was spent,
> a government agency or supplier dealing in large sums all of a sudden.

This is really important and we've been discussing it a bit. The
question really is: what kind of algorithms/heuristics can we use to
detect outliers? Are these techniques one-size-fits-all, or do we need
to select different ones for each dataset (I'm pretty sure they need
to be different for spending and budget, but think we can generalize
otherwise)? And: when do they get run? Are these still QA measures
we're talking about or is it actual data-mining on the loaded data?

I'd tend to see it as the latter and was actually thinking about the
idea of analytics snippets: we could just offer the option to run
pieces of javascript (its easy to sandbox and learn) on an entire
dataset, emitting "matching" records into a "review bucket". We can
then share the snippets between datasets - if someone implements a
nice algo, this could be parameterized and re-used.

What do people think?

- Friedrich

> On Tue, Dec 6, 2011 at 8:29 PM, Lucy Chambers <lucy.chambers at okfn.org> wrote:
>> How do readers know that the numbers quoted in news stories about spending
>> are correct? As part of the work on Spending Stories, the OpenSpending team
>> have been working on creating a wiki-like system that lets the user scroll
>> back, trace and reproduce changes made to the dataset every step of the way,
>> from initial data release to visualisation.
>
> _______________________________________________
> openspending mailing list
> openspending at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/openspending




More information about the openspending mailing list