[openspending-dev] Data upload failed

Friedrich Lindenberg friedrich.lindenberg at okfn.org
Sat Feb 4 23:59:43 UTC 2012


Right, but this is a big difference between Refine and OpenSpending:
Refine is a data-cleansing tool so one of the priorities is that it
can parse almost any kind of input, as long as there is some basic way
of guessing its content (in fact, you can even fine-tune the amount of
guessing it will do by disabling type detection etc.).

OpenSpending is quite different in that we are explicitly not handling
data cleansing: there is a long and somewhat painful list [1] of
formatting standards your data needs to conform to in order to be
loadable. The reason behind this is that you really do need data that
is consistent to perform any kind of analysis.

At the same time, in order to do cleansing, you need something that is
at least as powerful as Refine - and OS just cannot provide that. So
I'd much rather send you back to your data-wrangling tool with some
useful (!, not gonna cite RFCs) messages than attempt to do half-assed
data cleansing on the fly.

Of course, CSV/TSV is an edge case here, but the general rule applies:
OS is strict in what it accepts, so that the outcome will remain
useable.

tl;dr - I don't think postel's law applies to databases.

- Friedrich

[1] http://openspending.org/help/data-cleansing.html#some-common-problems


On Sat, Feb 4, 2012 at 11:46 PM, Gregor Aisch <gregor.aisch at okfn.org> wrote:
>
> Also, generally spoken, when designing a system for uploading data, I'd
> always prefer the try-to-read-everything strategy over forcing
> (inexperienced) users to convert data to some nerdy RFC standards. In fact,
> 80% of our users will not try to upload their data again after facing an
> error message like "sorry, but your data is not in the right format (=you
> suck). please read the RFC 4180 for more details (=just give it up,
> stupid).".
>
> I really love how Refine handles data imports.
>
>
>
>
> Am 05.02.2012 um 00:24 schrieb Friedrich Lindenberg:
>
> Hey Gregor,
>
> thanks for trying this but I'm not sure we want to support this - I've
> actually limited the set of things messytables will do in OpenSpending
> intentionally because I think that when data is handed to OS, it
> should already be formatted properly (which includes using actual
> CSV).
>
> I'll soon have to do a lot of fixes against messytables for the DGU
> spend, but suspect enabling all this in OpenSpending may actually lead
> to more ambiguity than just having a clear rule
> (http://tools.ietf.org/html/rfc4180)
>
> What do you think?
>
> - Friedrich
>
> On Sat, Feb 4, 2012 at 6:42 PM, Gregor Aisch <gregor.aisch at okfn.org> wrote:
>
> Tried to upload some spending data to openspending to find out that the
>
> automated CSV recognition failed to detect the tab-separated table..
>
>
> Seems to be a bug in messytables so I added a new issue. Who's maintaining
>
> that package?
>
>
> https://github.com/okfn/messytables/issues/3
>
>
>
>
> _______________________________________________
>
> openspending-dev mailing list
>
> openspending-dev at lists.okfn.org
>
> http://lists.okfn.org/mailman/listinfo/openspending-dev
>
>
>




More information about the openspending-dev mailing list