[ckan-dev] Looking for a recent JSON dump from the Data Hub

Jonathan Gray jonathan.gray at okfn.org
Tue Jul 31 19:55:51 UTC 2012


Fantastic work Paul. Very interesting indeed!

Mark, Irina, all: worth cross-posting/linking to this from the CKAN blog,
perhaps with a little context?

J.

On Tue, Jul 31, 2012 at 7:06 PM, Paul Miller <paul.miller at cloudofdata.com>wrote:

> For anyone who is interested, my quick analysis of license usage in the
> Data Hub is now up at
> http://cloudofdata.com/2012/07/thinking-about-open-data-with-a-little-help-from-the-data-hub/
> .
>
> Thanks again, Adrià, for the help…
>
> Paul
>
>   <http://cloudofdata.com/> Dr Paul Miller
> Cloud of Data
>
> cloudofdata.com/contact
>
> On 31 Jul 2012, at 14:44, Paul Miller <paul.miller at cloudofdata.com> wrote:
>
> Adrià
>
> That gave me exactly what I needed; many thanks.
>
> Paul
>
>   <http://cloudofdata.com/> Dr Paul Miller
> Cloud of Data
>
> cloudofdata.com/contact
>
> On 31 Jul 2012, at 14:38, Adrià Mercader <amercadero at gmail.com> wrote:
>
> Hi Paul,
>
> I can't speak for the json dump, but an easier way of getting this
> information may be using the new version of the search API, which allows to
> facet by any field. In your case, it will be "license_id".
>
> For instance this call:
>
> curl -X POST -d @payload.json
> http://thedatahub.org/api/action/package_search
>
> Where payload.json contains:
>
> {
>     "q":"*:*",
>     "facet":"true",
>     "facet.field":"license_id",
>     "rows": 0
> }
>
> Will return a list of all the different values for license id indexed
> (whether or not "official" licenses), see attached file.
>
> Note that on the next version of CKAN, due in a couple of weeks, these API
> will be able to be done via a GET request.
>
> Hope this helps,
>
> Adrià
>
>
>
>
>
> On 31 July 2012 13:21, Paul Miller <paul.miller at cloudofdata.com> wrote:
>
>> Good afternoon
>>
>> I've been doing some work with the JSON dump at
>> http://thedatahub.org/dump/. However, this is over a year old, and I'm
>> now trying to get hold of a more current data set.
>>
>> Mark Wainwright at the Open Knowledge Foundation suggested that I ask
>> here. So… does anyone have a more current JSON dump, or know of an easy way
>> for me to get myself one?
>>
>> Many thanks
>>
>> Paul
>>
>> And, for those who want some background, or who can see a far better way
>> to do what I'm trying to do… some background.
>>
>> I'm looking at the occurrence of different licenses in the Data Hub data.
>> Using the CKAN api, it's easy to see a list of permissible licenses;
>> http://thedatahub.org/api/1/rest/licenses
>>
>> It's then straightforward to step through the JSON dump, getting a count
>> of occurrences of different values for license_id. In the year-old data set
>> I've got, there are just over 2,000 records (half the number currently
>> available in the Data Hub), and about a third of those have license values
>> (mostly 'null', but also others like 'apache' and 'gpl-2.0' '3.0') that
>> aren't part of the set of values reported from
>> http://thedatahub.org/api/1/rest/licenses. It's therefore useful (to me)
>> to be able to visually skim the file for these odd values, rather than
>> simply querying the api for known terms…
>>
>> Any help gratefully received.
>>
>>
>>   <http://cloudofdata.com/> Dr Paul Miller
>> Cloud of Data
>>
>> cloudofdata.com/contact
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>>
> <results.json>_______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>


-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://www.okfn.org

http://twitter.com/jwyg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120731/809b5feb/attachment-0001.html>


More information about the ckan-dev mailing list