[ckan-discuss] CKAN - Google Refine integration
Rufus Pollock
rufus.pollock at okfn.org
Fri May 13 19:47:41 BST 2011
On 29 April 2011 20:44, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> On 26 April 2011 11:50, Maali, Fadi <fadi.maali at deri.org> wrote:
> [...]
>
>> 3. using the "RDF Extension for Google Refine" (available at:
>> http://lablab.linkeddata.deri.ie/2010/grefine-rdf-extension/ ) the data
>> can be exported as RDF
>> 4. The result RDF data is saved back to CKAN and linked to the
>> respective package.
>>
>> It is the last step actually that is still missing some details and
>> requires discussion. Our tentative ideas about it:
>> - result data is saved to storage.ckan.net (we need help from CKAN guys
>> here)
@Fadi: are the new storage features I mentioned (see below) sufficient
for what you want to do (i.e. direct upload of cleaned data to CKAN).
I think this is a really killer feature and if there is anything I can
do to help please let me know.
Rufus
> Indeed it does (sorry for slow reply: been meaning to reply earlier
> but was updating the existing storage code and this took a bit longer
> than anticipated). The Good News is that this functionality already
> exists (but hasn't been properly announced before) thanks to the
> integrated storage extension for CKAN:
> <https://bitbucket.org/okfn/ckanext-storage/>
>
> This extension adds:
>
> * Some new methods to the CKAN API for dealing with storage
> * An /upload page to web interface for doing file uploads
>
> Upload page is here: (please don't just try this out to experiment as
> you won't be able to delete :-) -- for experimentation we'll be making
> a demo.ckan.net available for that in next couple of days!):
>
> <http://ckan.net/storage/upload>
>
> Here's a demo uploaded file I did earlier:
>
> <http://ckan.net/storage/f/file/8630a664-0ae4-485f-99c2-126dae95653a>
>
> Having uploaded a file you'd then use that url in a resource (we're
> working to get this integrated into the package editing workflow --
> the hard work is done so this should be quite simple ...).
>
> However you may not want to upload by 'hand' so there is also an auth
> api which gives you relevant headers and details for uploading
> directly to the storage backend (google storage atm).
>
> WARNING: this is still a bit alpha and subject to change (please
> report bugs to ckan-dev or to trac.ckan.org)
>
> /api/storage/auth/request/{key}
>
> Docstring:
>
> Provide authentication information for a request so a client can
> interact with backend storage directly.
>
> :param label: key.
> :param kwargs: sent either via query string for GET or json-encoded
> dict for POST). Interpreted as http headers for request plus an
> (optional) method parameter (being the HTTP method).
>
> Examples of headers are:
>
> Content-Type
> Content-Encoding (optional)
> Content-Length
> Content-MD5
> Expect (should be '100-Continue')
>
> :return: is a json hash containing various attributes including a
> headers dictionary containing an Authorization field which is good for
> 15m.
>
> Current convention is that you should prefix your 'key' with "file/"
> (idea is we are uploading to file 'directory'). We may bake this in to
> the system pretty soon ...
>
>> - the result data is considered a new resource of the existing package.
>> This is automatically registered through the CKAN API.
>> - along with the RDF data we save the JSON representation of all Google
>> Refine operations that have been applied to the original data i.e. any
>> one starting with the CSV file on CKAN can re-apply the operations using
>> the JSON representation in Google Refine to get an exact copy of the RDF
>> data
>>
>> Does that look reasonable? Any feedback?
>
> This seems great and I really like the approach where you save the
> refine json as well as the new resource.
>
> Rufus
>
--
Co-Founder, Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/
More information about the ckan-discuss
mailing list