[okfn-labs] automating connection of CKAN to R stats package

Jonathan Gray jonathan.gray at okfn.org
Fri Jul 27 11:02:49 UTC 2012


Interesting - worth posting on CKAN blog about this at an appropriate juncture?

J.

On Fri, Jul 27, 2012 at 9:02 AM, Friedrich Lindenberg
<friedrich.lindenberg at okfn.org> wrote:
> Hey Martin,
>
> that's a really nice demo! One thing about this that I have thought
> about for a while is what you do with "head" in your video: finding
> overlapping columns between datasets. Assume you've got a database
> like this:
>
> http://opendatalabs.org/misc/ckan-dataset-links.db
>
> This has each resource in CKAN (in fact it doesn't - tried running it
> overnight but there are quite lot of errors coming from ES), and, in a
> second table, the names of each field in these (if they were in the
> DataStore). A final table has the facets, i.e. the 100 most common
> values of that field.
>
> How can you make a link recommender? Would you do it on column name,
> value overlap, ...? Could this happen in pyBossa or even fully
> unsupervised?
>
> Having such a recommender would make both Ronalds happy: it lowers the
> cost of finding related data and gives you all the junk links you can
> eat....
>
> - Friedrich
>
> On Fri, Jul 27, 2012 at 1:29 AM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
>> On 26 July 2012 22:00, Martin Keegan <martin.keegan at okfn.org> wrote:
>>> Hello,
>>>
>>> part of my project exploring automating processing of tabular data has
>>> been recorded as a video, which is here:
>>> http://mk.ucant.org/media/ckan-to-r.flv; the last three post on my
>>> blog give some more details; if you were at the recent OKF staff
>>> summit you'll have seen a failed demo of basically what's in the
>>> video, which goes out of its way to show it's not being faked - the
>>> real work could be typed in in about 20 seconds.
>>
>> this is fantastic Martin. Data from CKAN -> R -> integrated and
>> analyzed in 20s :-)
>>
>> BTW for those note able to dig out the posts they are:
>>
>> Project Ronald, an introduction: http://blog.ucant.org/?p=393
>>
>> Project Ronald, an example: http://blog.ucant.org/?p=414
>>
>> Quoting from the second of these posts:
>>
>> <quote>
>> The first objective of Project Ronald is to make it easy to connect
>> tabular datasets quickly: given two openly-licensed tabular datasets
>> containing a common field, but published by different organisations,
>> it ought to be possible to get them downloaded and joined together in
>> a few seconds. The approach is to identify the components of a system
>> which would do this, implement a minimal version of each, check that
>> the system works as a whole, and then go about replacing each
>> component with better tools, preferably ones already written and
>> matured by someone else.
>> </quote>
>>
>> And finally code on github:
>>
>> https://github.com/mk270/ronald
>>
>> Rufus
>>
>> _______________________________________________
>> okfn-labs mailing list
>> okfn-labs at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/okfn-labs
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs



-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://www.okfn.org

http://twitter.com/jwyg




More information about the okfn-labs mailing list