[ckan-discuss] extending the Google Refine CKAN extension

Rufus Pollock rufus.pollock at okfn.org
Fri May 11 04:16:30 BST 2012


On 8 May 2012 19:15, Fadi Maali <fadi.maali at deri.org> wrote:
> Hi all,
>
> This email is about extending the Google Refine CKAN extension available at https://github.com/fadmaa/grefine-ckan-storage-extension and described at: http://ckan.org/2011/07/05/google-refine-extension-for-ckan/
>
>
> Rufus suggested that with the release of the data store API, we can make the Google Refine CKAN Extension write directly to the store instead of using the file upload API.
> So the idea is to serialize the data in  Google Refine in JSON and write it to the store API.
> We discussed two alternatives to serialize the data in JSON:
> 1. the simpler one, is to serialize the data in Google Refine in JSON in a direct mapping from the tabular representation i.e. the data will look something like:
> {rows:[
> {"column1-name":value, "column2-name":value-2, etc...}
> ]}
> 2. to model the data in JSON-LD based on the model defined using the RDF Extension of Google Refine(if the model is defined)
>
> option 1 is currently realized by uploading CSV data as thedatahub.org will serialize CSV data directly. See for example http://thedatahub.org/dataset/states (the CSV resource at: http://thedatahub.org/dataset/states/resource/4336bf82-317f-4f8c-aa3b-2898a0dfe58b ) So I am not sure if implementing it in the extension is worthy.

The only thing that won't work well is if you have nested data of any
kind (e.g. JSON). Also what happens if you want to *update* existing
data in the datastore. Uploading a CSV there does not work as well
because that results in a new data resource and a new DataStore table
(though we could change CKAN to allow a reimport of a new file into an
existing DataStore table).

Nevertheless, just being able to upload a CSV file export from Refine
would be very nice :-)

> I am also not a big fan of option 2 as I am not a big fan of JSON-LD and in fact the extension was built to encourage people sharing RDF data. Getting the JSON-LD from RDF can be easy (for example this Java library does it https://github.com/tristan/jsonld-java)

Understood, though bigger value here is in storing the JSON-LD context/mapping.

> I thought about a third option, which is allowing users to use Google Refine templating functionality to export data in a customized JSON shape and write it directly to theDataHub.org. This will allow people to easily share JSON data based on a customized(and hopefully more useful) JSON schema. If you are interested in more details about the tempting in Google Refine see: http://code.google.com/p/google-refine/wiki/ExportAsYAML

This sounds nice and one could still make the JSON into JSON-LD by
adding a context/mapping.

Rufus

> I started working on the last option. I'd be really glad to hear your opinions.
>
> Best regards,
> Fadi



More information about the ckan-discuss mailing list