[Okfn-ca] Alternatives to OpenRefine | Fwd: School-of-data Digest, Vol 16, Issue 2

Peder Jakobsen pjakobsen at gmail.com
Tue Jul 9 15:05:26 UTC 2013


On 2013-07-09, at 8:39 AM, Diane Mercier <diane.mercier at gmail.com> wrote:

> It may depend on exactly what you want. Regularly on the OpenRefine list people post requests for functionality that tend to be answered 'look at dedicated ETL solutions'. Open Source ETL solutions mentioned include  Talend OpenStudio or Pentaho Data Integration
> 
> I don't think these are quite 'alternatives' to Refine but it may depend on what exactly you want to to do and the skills/resources you have available.

Owen is right, the answer to this question depends very much on what skills and resources are available.  

My personal preference is to use a scripting language for all ETL work.  There is no bizarre corner case or integration problem that cannot be easily dealt with a simple script.  Python is an obvious choice: tasks that would be a hassle with a tool like OpenRefine or in, say, Java  are a breeze, fast, and somewhat enjoyable to work on. 

ETL is my full time job, so I do grant that that not everyone has the luxury to figure out all the tricks of data manipulation with something like Python or Ruby.  But if possible, it's an investment that is worth making, and will pay big dividends  over the long term for any organization that needs to aggregate  data. 

Cheers,

Peder Jakobsen
Consultant, OKFN CKAN & data.gc.ca


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-ca/attachments/20130709/295742d8/attachment-0001.html>


More information about the okfn-ca mailing list