[ckan-discuss] CKAN integration with gridworks / refine

Tim McNamara paperless at timmcnamara.co.nz
Sat Nov 27 21:31:31 GMT 2010

On 28 November 2010 07:45, Rufus Pollock <rufus.pollock at okfn.org> wrote:

> How could this work? I'd be interested to hear what others think here
> but here's a couple of initial ideas.
> Rufus
> ### Scenario 1
> 1. User installs Refine and CKAN extension for refine
> 2. On booting refine and asked to load data they can choose from any
> data package on CKAN.net (or any other CKAN instance)
> 3. They edit the dataset on Refine
> 4. On save (or perhaps as a separate option) they are prompted as to
> whether they wish  to sync the dataset back to CKAN (either as a new
> package or as a new resource on the existing package)
> NB: for the dataset sync back some form of "CKAN" storage would be
> required (we already have storage.ckan.net running but a closer
> integration would be required)

This has lots of promise. It's how I have been using Refine with open data
so far. When I've wanted to republish data, I've used github[1].. but
generally the added time this takes means that I've only done this once.

> ### Scenario 2
> 1. User visits a package on CKAN.net (or another CKAN instance)
> 2. There is a button on the page "View and edit this dataset in Google
> Refine"
> 3. Click button -- ask them if they have Google refine installed
>  * Yes: instructions for loading dataset into refine
>  * No: load dataset in hosted version of google refine (we could run this)
> 4. User edits dataset and hits save. As in previous scenario they are
> prompted to sync the dataset.

This is less viable. Google Refine is very memory heavy. It doesn't use a
database, but stores the entire dataset in an in-memory data store. Many of
the datasets in CKAN cripple my instances of Refine.

Also, consider scenario 3 (possibly as scenario 1.b.):

1. Analyst has cleaned-up spreadsheet in Refine
2. CKAN package doesn't exist
3. Analyst creates a new package based on her current data

Another option for integration would be to create a reconcillation service
inside of Google Refine. That would be much more work, but potentially very

[1] https://github.com/timClicks/nz-coal-data
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-discuss/attachments/20101128/400140d6/attachment.htm>

More information about the ckan-discuss mailing list