[okfn-labs] Help clean up the UK spending data!
Friedrich Lindenberg
friedrich.lindenberg at okfn.org
Wed Aug 1 07:55:46 UTC 2012
Hey,
On Tue, Jul 31, 2012 at 4:11 PM, Thomas Kluyver <takowl at gmail.com> wrote:
> On 31 July 2012 14:30, Friedrich Lindenberg
> <friedrich.lindenberg at okfn.org> wrote:
>> e) Expense Type Code != Expense Type - try to keep code fields and
>> text fields separate.
>
> I don't know if it's easy with the platform, but showing a sample of
> the data in the column would make it easier to distinguish this kind
> of thing.
Yeah, that would be cool. The tool can actually store this kind of
contextual info, I just haven't added it to the UI yet.
>> I appreciate that this is an awkward process, but have come to the
>> conclusion that using more automation will just give us bad data. At
>> the moment, we've go around 4.3mio records extracted - with your help
>> we can bring this up to 6mio.
>
> I've gone through 20 or so, and I can't help thinking that a bit more
> automation wouldn't go amiss - most of them differed from the target
> only by having a space between words - "Expense type" instead of
> "ExpenseType".
Hm, I've deleted these clear duplicates, but we're still at 1.1k
unlinked entries, unfortunately.
Thanks for your help,
- Friedrich
More information about the okfn-labs
mailing list