[ckan-dev] Looking for a recent JSON dump from the Data Hub

Adrià Mercader amercadero at gmail.com
Tue Jul 31 13:38:04 UTC 2012


Hi Paul,

I can't speak for the json dump, but an easier way of getting this
information may be using the new version of the search API, which allows to
facet by any field. In your case, it will be "license_id".

For instance this call:

curl -X POST -d @payload.json
http://thedatahub.org/api/action/package_search

Where payload.json contains:

{
    "q":"*:*",
    "facet":"true",
    "facet.field":"license_id",
    "rows": 0
}

Will return a list of all the different values for license id indexed
(whether or not "official" licenses), see attached file.

Note that on the next version of CKAN, due in a couple of weeks, these API
will be able to be done via a GET request.

Hope this helps,

Adrià





On 31 July 2012 13:21, Paul Miller <paul.miller at cloudofdata.com> wrote:

> Good afternoon
>
> I've been doing some work with the JSON dump at
> http://thedatahub.org/dump/. However, this is over a year old, and I'm
> now trying to get hold of a more current data set.
>
> Mark Wainwright at the Open Knowledge Foundation suggested that I ask
> here. So… does anyone have a more current JSON dump, or know of an easy way
> for me to get myself one?
>
> Many thanks
>
> Paul
>
> And, for those who want some background, or who can see a far better way
> to do what I'm trying to do… some background.
>
> I'm looking at the occurrence of different licenses in the Data Hub data.
> Using the CKAN api, it's easy to see a list of permissible licenses;
> http://thedatahub.org/api/1/rest/licenses
>
> It's then straightforward to step through the JSON dump, getting a count
> of occurrences of different values for license_id. In the year-old data set
> I've got, there are just over 2,000 records (half the number currently
> available in the Data Hub), and about a third of those have license values
> (mostly 'null', but also others like 'apache' and 'gpl-2.0' '3.0') that
> aren't part of the set of values reported from
> http://thedatahub.org/api/1/rest/licenses. It's therefore useful (to me)
> to be able to visually skim the file for these odd values, rather than
> simply querying the api for known terms…
>
> Any help gratefully received.
>
>
>   <http://cloudofdata.com/> Dr Paul Miller
> Cloud of Data
>
> cloudofdata.com/contact
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120731/0338df4c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: results.json
Type: application/json
Size: 9523 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120731/0338df4c/attachment-0001.json>


More information about the ckan-dev mailing list