[ckan-discuss] CKAN - Google Refine integration

Rufus Pollock rufus.pollock at okfn.org
Fri Apr 29 20:44:27 BST 2011


On 26 April 2011 11:50, Maali, Fadi <fadi.maali at deri.org> wrote:
[...]

> 3. using the "RDF Extension for Google Refine" (available at:
> http://lablab.linkeddata.deri.ie/2010/grefine-rdf-extension/ ) the data
> can be exported as RDF
> 4. The result RDF data is saved back to CKAN and linked to the
> respective package.
>
> It is the last step actually that is still missing some details and
> requires discussion. Our tentative ideas about it:
> - result data is saved to storage.ckan.net (we need help from CKAN guys
> here)

Indeed it does (sorry for slow reply: been meaning to reply earlier
but was updating the existing storage code and this took a bit longer
than anticipated). The Good News is that this functionality already
exists (but hasn't been properly announced before) thanks to the
integrated storage extension for CKAN:
<https://bitbucket.org/okfn/ckanext-storage/>

This extension adds:

  * Some new methods to the CKAN API for dealing with storage
  * An /upload page to web interface for doing file uploads

Upload page is here: (please don't just try this out to experiment as
you won't be able to delete :-) -- for experimentation we'll be making
a demo.ckan.net available for that in next couple of days!):

<http://ckan.net/storage/upload>

Here's a demo uploaded file I did earlier:

<http://ckan.net/storage/f/file/8630a664-0ae4-485f-99c2-126dae95653a>

Having uploaded a file you'd then use that url in a resource (we're
working to get this integrated into the package editing workflow --
the hard work is done so this should be quite simple ...).

However you may not want to upload by 'hand' so there is also an auth
api which gives you relevant headers and details for uploading
directly to the storage backend (google storage atm).

WARNING: this is still a bit alpha and subject to change (please
report bugs to ckan-dev or to trac.ckan.org)

/api/storage/auth/request/{key}

Docstring:

Provide authentication information for a request so a client can
interact with backend storage directly.

        :param label: key.
        :param kwargs: sent either via query string for GET or json-encoded
            dict for POST). Interpreted as http headers for request plus an
            (optional) method parameter (being the HTTP method).

            Examples of headers are:

                Content-Type
                Content-Encoding (optional)
                Content-Length
                Content-MD5
                Expect (should be '100-Continue')

        :return: is a json hash containing various attributes including a
        headers dictionary containing an Authorization field which is good for
        15m.

Current convention is that you should prefix your 'key' with "file/"
(idea is we are uploading to file 'directory'). We may bake this in to
the system pretty soon ...

> - the result data is considered a new resource of the existing package.
> This is automatically registered through the CKAN API.
> - along with the RDF data we save the JSON representation of all Google
> Refine operations that have been applied to the original data i.e. any
> one starting with the CSV file on CKAN can re-apply the operations using
> the JSON representation in Google Refine to get an exact copy of the RDF
> data
>
> Does that look reasonable? Any feedback?

This seems great and I really like the approach where you save the
refine json as well as the new resource.

Rufus



More information about the ckan-discuss mailing list