[openspending-dev] Diff for spending CSVs
anders.pedersen at okfn.org
Thu Jun 27 20:12:20 UTC 2013
Great work and thanks for sharing! Making it easier to update transactional
spending datasets into OpenSpending is a super useful improvement.
On 27 June 2013 12:55, Friedrich Lindenberg <friedrich at pudo.org> wrote:
> This is really cool, David!
> After a quick look, it looks to me like there's nothing really
> spend-specific in there: have you considered pinging @onyxfish about
> pushing this into csvkit? Would make a valuable contribution!
> - Friedrich
> On Thu, Jun 27, 2013 at 6:50 PM, David Read <
> david.read at hackneyworkshop.com> wrote:
>> I've written a tool to run in OpenSpending ETL for discarding the
>> parts of the CSV of spending transactions that are already loaded.
>> This is useful for the data.gov.uk work where the CSV is 4Gb, and
>> updated daily from source data, but of that, there are only a tiny
>> number of new/changed rows that need loading into the OpenSpending
>> database each day.
>> Making this was a suggestion of Pudo's:
>> > find out how to make diff emit the only lines that have been added and
>> use that to generate incremental spendingsource files.
>> The code is in our ETL here:
>> https://github.com/openspending/dpkg-uk25k/blob/master/spend_diff.py -
>> feel free to put it into the core OpenSpending code if that makes
> openspending-dev mailing list
> openspending-dev at lists.okfn.org
> Unsubscribe: http://lists.okfn.org/mailman/options/openspending-dev
Community Coordinator | skype: anpehej | @anpe <https://twitter.com/>
The Open Knowledge Foundation <http://okfn.org/>
Empowering through Open Knowledge
http://okfn.org/ | @okfn <http://twitter.com/OKFN> | OKF on
Blog <http://blog.okfn.org/> | Newsletter<http://okfn.org/about/newsletter>
OpenSpending | http://openspending.org |
School of Data | http://schoolofdata.org |
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openspending-dev