[ckan-discuss] CKAN - Google Refine integration

Richard Cyganiak richard at cyganiak.de
Tue Apr 26 21:31:22 BST 2011


Hi Monika,

On 26 Apr 2011, at 13:24, Monika Solanki wrote:
> An important part of the last step IMHO is the provision of voID files for the new RDF datasets, I assume the datasets will become a part of he LOD cloud at some point, so there is a mileage in having their voID descriptions besides the package metadata provided by CKAN at http://semantic.ckan.net.

Well ...

From a CKAN perspective, VoID is just another machine-readable format for the metadata that already is (or should be) stored in CKAN.

In fact, the RDF datasets that are listed in CKAN already have quite a bit of VoID information available, via semantic.ckan.net:

http://semantic.ckan.net/record/dcc6715c-bf94-4a89-bbf3-35933da795a5.ttl

This is because semantic.ckan.net is aware of the special CKAN conventions that we defined for describing RDF dataset:

http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

(we should really move that page into the CKAN wiki ...)

> Would the dataset providers be notified of their RDF datasets once they become available? It would be worth pointing them to the voID editor.

CKAN's package entry form and the VoID editor are pretty much the same thing ... Only one generates an entry in the CKAN database, while the other generates an RDF file that you have to upload somewhere.

So to me it makes more sense to just keep all the metadata that describes the dataset in CKAN, rather than putting it into a separate off-site file.

Best,
Richard


> I assume it may be difficult to fill up values for many of the voiD attributes automatically from their CVS/Excel representations.
> 
> Monika
> 
> On 26/04/11 11:50, Maali, Fadi wrote:
>> Hi all,
>> 
>> This has been discussed here before and was also discussed in the last
>> CKAN online community meetup.
>> 
>> I will describe here a scenario Richard Cyganiak and I are working on.
>> Our goal is to help publishing datasets currently available in CSV or
>> Excel format as Linked Data.
>> 
>> 1. navigate the packages available in a CKAN catalogue from within
>> Google Refine. This is currently implemented as an extension to Google
>> Refine and use the RDF representation of CKAN catalogues (as the ones
>> available at http://semantic.ckan.net). Any package that has a resource
>> understandable by Google Refine (a.k.a CSV, Excel, TSV...) can be opened
>> as a Google Refine project.
>> 2. Google Refine is used to conduct any data cleaning and transformation
>> required.
>> 3. using the "RDF Extension for Google Refine" (available at:
>> http://lablab.linkeddata.deri.ie/2010/grefine-rdf-extension/ ) the data
>> can be exported as RDF
>> 4. The result RDF data is saved back to CKAN and linked to the
>> respective package.
>> 
>> It is the last step actually that is still missing some details and
>> requires discussion. Our tentative ideas about it:
>> - result data is saved to storage.ckan.net (we need help from CKAN guys
>> here)
>> - the result data is considered a new resource of the existing package.
>> This is automatically registered through the CKAN API.
>> - along with the RDF data we save the JSON representation of all Google
>> Refine operations that have been applied to the original data i.e. any
>> one starting with the CSV file on CKAN can re-apply the operations using
>> the JSON representation in Google Refine to get an exact copy of the RDF
>> data
>> 
>> Does that look reasonable? Any feedback?
>> 
>> Regards,
>> Fadi
>> 
>> _______________________________________________
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-discuss
> 
> 
> _______________________________________________
> ckan-discuss mailing list
> ckan-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-discuss




More information about the ckan-discuss mailing list