[ckan-discuss] CKAN - Google Refine integration

Maali, Fadi fadi.maali at deri.org
Sun May 15 12:09:11 BST 2011


Hi Rufus/All,

Sorry for late reply... 

The storage features described below looks sufficient. I was waiting it to be available through demo.ckan.net so that I can experiment with it. Any news on that?

Thanks,
Fadi

> -----Original Message-----
> From: okfn.rufus.pollock at gmail.com
> [mailto:okfn.rufus.pollock at gmail.com] On Behalf Of Rufus Pollock
> Sent: 13 May 2011 19:48
> To: Maali, Fadi; Richard Cyganiak
> Cc: CKAN discuss
> Subject: Re: [ckan-discuss] CKAN - Google Refine integration
> 
> On 29 April 2011 20:44, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> > On 26 April 2011 11:50, Maali, Fadi <fadi.maali at deri.org> wrote:
> > [...]
> >
> >> 3. using the "RDF Extension for Google Refine" (available at:
> >> http://lablab.linkeddata.deri.ie/2010/grefine-rdf-extension/ ) the
> data
> >> can be exported as RDF
> >> 4. The result RDF data is saved back to CKAN and linked to the
> >> respective package.
> >>
> >> It is the last step actually that is still missing some details and
> >> requires discussion. Our tentative ideas about it:
> >> - result data is saved to storage.ckan.net (we need help from CKAN
> guys
> >> here)
> 
> @Fadi: are the new storage features I mentioned (see below) sufficient
> for what you want to do (i.e. direct upload of cleaned data to CKAN).
> I think this is a really killer feature and if there is anything I can
> do to help please let me know.
>
> Rufus
> 
> > Indeed it does (sorry for slow reply: been meaning to reply earlier
> > but was updating the existing storage code and this took a bit longer
> > than anticipated). The Good News is that this functionality already
> > exists (but hasn't been properly announced before) thanks to the
> > integrated storage extension for CKAN:
> > <https://bitbucket.org/okfn/ckanext-storage/>
> >
> > This extension adds:
> >
> >  * Some new methods to the CKAN API for dealing with storage
> >  * An /upload page to web interface for doing file uploads
> >
> > Upload page is here: (please don't just try this out to experiment as
> > you won't be able to delete :-) -- for experimentation we'll be
> making
> > a demo.ckan.net available for that in next couple of days!):
> >
> > <http://ckan.net/storage/upload>
> >
> > Here's a demo uploaded file I did earlier:
> >
> > <http://ckan.net/storage/f/file/8630a664-0ae4-485f-99c2-126dae95653a>
> >
> > Having uploaded a file you'd then use that url in a resource (we're
> > working to get this integrated into the package editing workflow --
> > the hard work is done so this should be quite simple ...).
> >
> > However you may not want to upload by 'hand' so there is also an auth
> > api which gives you relevant headers and details for uploading
> > directly to the storage backend (google storage atm).
> >
> > WARNING: this is still a bit alpha and subject to change (please
> > report bugs to ckan-dev or to trac.ckan.org)
> >
> > /api/storage/auth/request/{key}
> >
> > Docstring:
> >
> > Provide authentication information for a request so a client can
> > interact with backend storage directly.
> >
> >        :param label: key.
> >        :param kwargs: sent either via query string for GET or json-
> encoded
> >            dict for POST). Interpreted as http headers for request
> plus an
> >            (optional) method parameter (being the HTTP method).
> >
> >            Examples of headers are:
> >
> >                Content-Type
> >                Content-Encoding (optional)
> >                Content-Length
> >                Content-MD5
> >                Expect (should be '100-Continue')
> >
> >        :return: is a json hash containing various attributes
> including a
> >        headers dictionary containing an Authorization field which is
> good for
> >        15m.
> >
> > Current convention is that you should prefix your 'key' with "file/"
> > (idea is we are uploading to file 'directory'). We may bake this in
> to
> > the system pretty soon ...
> >
> >> - the result data is considered a new resource of the existing
> package.
> >> This is automatically registered through the CKAN API.
> >> - along with the RDF data we save the JSON representation of all
> Google
> >> Refine operations that have been applied to the original data i.e.
> any
> >> one starting with the CSV file on CKAN can re-apply the operations
> using
> >> the JSON representation in Google Refine to get an exact copy of the
> RDF
> >> data
> >>
> >> Does that look reasonable? Any feedback?
> >
> > This seems great and I really like the approach where you save the
> > refine json as well as the new resource.
> >
> > Rufus
> >
> 
> 
> 
> --
> Co-Founder, Open Knowledge Foundation
> Promoting Open Knowledge in a Digital Age
> http://www.okfn.org/ - http://blog.okfn.org/


More information about the ckan-discuss mailing list