[ckan-discuss] CKAN - Google Refine integration

Maali, Fadi fadi.maali at deri.org
Tue Apr 26 11:50:53 BST 2011


Hi all,

This has been discussed here before and was also discussed in the last
CKAN online community meetup.

I will describe here a scenario Richard Cyganiak and I are working on.
Our goal is to help publishing datasets currently available in CSV or
Excel format as Linked Data.

1. navigate the packages available in a CKAN catalogue from within
Google Refine. This is currently implemented as an extension to Google
Refine and use the RDF representation of CKAN catalogues (as the ones
available at http://semantic.ckan.net). Any package that has a resource
understandable by Google Refine (a.k.a CSV, Excel, TSV...) can be opened
as a Google Refine project.
2. Google Refine is used to conduct any data cleaning and transformation
required.
3. using the "RDF Extension for Google Refine" (available at:
http://lablab.linkeddata.deri.ie/2010/grefine-rdf-extension/ ) the data
can be exported as RDF
4. The result RDF data is saved back to CKAN and linked to the
respective package.

It is the last step actually that is still missing some details and
requires discussion. Our tentative ideas about it:
- result data is saved to storage.ckan.net (we need help from CKAN guys
here)
- the result data is considered a new resource of the existing package.
This is automatically registered through the CKAN API.
- along with the RDF data we save the JSON representation of all Google
Refine operations that have been applied to the original data i.e. any
one starting with the CSV file on CKAN can re-apply the operations using
the JSON representation in Google Refine to get an exact copy of the RDF
data

Does that look reasonable? Any feedback?

Regards,
Fadi



More information about the ckan-discuss mailing list