[okfn-labs] Quick and dirty analytics on large CSVs - what's the best approach?

Tom Morris tfmorris at gmail.com
Thu Apr 11 18:43:28 UTC 2013


On Thu, Apr 11, 2013 at 2:35 PM, Emanuil Tolev <emanuil at cottagelabs.com>wrote:

> Would Open Refine ( http://openrefine.org/ ) be of any use at all? I've
> used on tens of megabytes, but certainly not thousands... it will also load
> the whole thing in memory I believe, and you need to have the file locally.
> On the other hand, it provides a nice UI for faceting, mass-editing and has
> always been blazingly fast to apply most of its operations, even similar
> string finding via edit distance or more complicated algorithms (when
> trying to find misspellings of strings, e.g. organisation names). It'd be a
> nice test to see how it scales.
>

I'm the project leader, so I certainly would have been inclined to
recommend it if I thought it'd be a good tool for the job, but I think this
is probably on the large side for Refine in its current incarnation.

Tom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130411/5534b4ce/attachment-0002.html>


More information about the okfn-labs mailing list