[ckan-discuss] CKAN - Google Refine integration

William Waites ww at styx.org
Sun May 15 13:18:37 BST 2011


So this would have been useful to me too over the last week as i transliterated a large dataset into rdf, though not using refine.  But in the end i ended up using archive.org which was painful not least because the dataset is broken into 653 parts... still unsure how to adequately deal with such datasets with ckan either...


"Maali, Fadi" <fadi.maali at deri.org> a écrit :

>Hi Rufus/All,
>
>Sorry for late reply... 
>
>The storage features described below looks sufficient. I was waiting it to be available through demo.ckan.net so that I can experiment with it. Any news on that?
>
>Thanks,
>Fadi
>
>> -----Original Message-----
>> From: okfn.rufus.pollock at gmail.com
>> [mailto:okfn.rufus.pollock at gmail.com] On Behalf Of Rufus Pollock
>> Sent: 13 May 2011 19:48
>> To: Maali, Fadi; Richard Cyganiak
>> Cc: CKAN discuss
>> Subject: Re: [ckan-discuss] CKAN - Google Refine integration
>> 
>> On 29 April 2011 20:44, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>> > On 26 April 2011 11:50, Maali, Fadi <fadi.maali at deri.org> wrote:
>> > [...]
>> >
>> >> 3. using the "RDF Extension for Google Refine" (available at:
>> >> http://lablab.linkeddata.deri.ie/2010/grefine-rdf-extension/ ) the
>> data
>> >> can be exported as RDF
>> >> 4. The result RDF data is saved back to CKAN and linked to the
>> >> respective package.
>> >>
>> >> It is the last step actually that is still missing some details and
>> >> requires discussion. Our tentative ideas about it:
>> >> - result data is saved to storage.ckan.net (we need help from CKAN
>> guys
>> >> here)
>> 
>> @Fadi: are the new storage features I mentioned (see below) sufficient
>> for what you want to do (i.e. direct upload of cleaned data to CKAN).
>> I think this is a really killer feature and if there is anything I can
>> do to help please let me know.
>>
>> Rufus
>> 
>> > Indeed it does (sorry for slow reply: been meaning to reply earlier
>> > but was updating the existing storage code and this took a bit longer
>> > than anticipated). The Good News is that this functionality already
>> > exists (but hasn't been properly announced before) thanks to the
>> > integrated storage extension for CKAN:
>> > <https://bitbucket.org/okfn/ckanext-storage/>
>> >
>> > This extension adds:
>> >
>> >  * Some new methods to the CKAN API for dealing with storage
>> >  * An /upload page to web interface for doing file uploads
>> >
>> > Upload page is here: (please don't just try this out to experiment as
>> > you won't be able to delete :-) -- for experimentation we'll be
>> making
>> > a demo.ckan.net available for that in next couple of days!):
>> >
>> > <http://ckan.net/storage/upload>
>> >
>> > Here's a demo uploaded file I did earlier:
>> >
>> > <http://ckan.net/storage/f/file/8630a664-0ae4-485f-99c2-126dae95653a>
>> >
>> > Having uploaded a file you'd then use that url in a resource (we're
>> > working to get this integrated into the package editing workflow --
>> > the hard work is done so this should be quite simple ...).
>> >
>> > However you may not want to upload by 'hand' so there is also an auth
>> > api which gives you relevant headers and details for uploading
>> > directly to the storage backend (google storage atm).
>> >
>> > WARNING: this is still a bit alpha and subject to change (please
>> > report bugs to ckan-dev or to trac.ckan.org)
>> >
>> > /api/storage/auth/request/{key}
>> >
>> > Docstring:
>> >
>> > Provide authentication information for a request so a client can
>> > interact with backend storage directly.
>> >
>> >        :param label: key.
>> >        :param kwargs: sent either via query string for GET or json-
>> encoded
>> >            dict for POST). Interpreted as http headers for request
>> plus an
>> >            (optional) method parameter (being the HTTP method).
>> >
>> >            Examples of headers are:
>> >
>> >                Content-Type
>> >                Content-Encoding (optional)
>> >                Content-Length
>> >                Content-MD5
>> >                Expect (should be '100-Continue')
>> >
>> >        :return: is a json hash containing various attributes
>> including a
>> >        headers dictionary containing an Authorization field which is
>> good for
>> >        15m.
>> >
>> > Current convention is that you should prefix your 'key' with "file/"
>> > (idea is we are uploading to file 'directory'). We may bake this in
>> to
>> > the system pretty soon ...
>> >
>> >> - the result data is considered a new resource of the existing
>> package.
>> >> This is automatically registered through the CKAN API.
>> >> - along with the RDF data we save the JSON representation of all
>> Google
>> >> Refine operations that have been applied to the original data i.e.
>> any
>> >> one starting with the CSV file on CKAN can re-apply the operations
>> using
>> >> the JSON representation in Google Refine to get an exact copy of the
>> RDF
>> >> data
>> >>
>> >> Does that look reasonable? Any feedback?
>> >
>> > This seems great and I really like the approach where you save the
>> > refine json as well as the new resource.
>> >
>> > Rufus
>> >
>> 
>> 
>> 
>> --
>> Co-Founder, Open Knowledge Foundation
>> Promoting Open Knowledge in a Digital Age
>> http://www.okfn.org/ - http://blog.okfn.org/
>_______________________________________________
>ckan-discuss mailing list
>ckan-discuss at lists.okfn.org
>http://lists.okfn.org/mailman/listinfo/ckan-discuss


More information about the ckan-discuss mailing list