[ckan-dev] Looking for a recent JSON dump from the Data Hub

Adrià Mercader amercadero at gmail.com
Wed Aug 1 09:55:49 UTC 2012


On 1 August 2012 10:11, Mark Wainwright <mark.wainwright at okfn.org> wrote
>
> If you're want to make a more direct comparison with Leigh Dodds' earlier study on the Linked Data cloud, you can restrict your search to the Linking Open Data Cloud group on the DataHub (http://thedatahub.org/group/lodcloud), home of the LOD Cloud. How you amend the cURL data to get that working I don't know but if you're interested I'm sure someone else will.

You just have to define the appropriate filter on the API call:

{
    "q":"groups:lodcloud",
    "facet":"true",
    "facet.field":"license_id",
    "rows": 0
}

This will return the faceted licenses just for the 327 datasets on the
LOD cloud group

The q parameter supports Solr syntax. For more info check the docs:

http://docs.ckan.org/en/latest/apiv3.html#ckan.logic.action.get.package_search





> Mark
>
>
>>
>> Thanks again, Adrià, for the help…
>>
>> Paul
>>
>> Dr Paul Miller
>> Cloud of Data
>>
>> cloudofdata.com/contact
>>
>> On 31 Jul 2012, at 14:44, Paul Miller <paul.miller at cloudofdata.com> wrote:
>>
>> Adrià
>>
>> That gave me exactly what I needed; many thanks.
>>
>> Paul
>>
>> Dr Paul Miller
>> Cloud of Data
>>
>> cloudofdata.com/contact
>>
>> On 31 Jul 2012, at 14:38, Adrià Mercader <amercadero at gmail.com> wrote:
>>
>> Hi Paul,
>>
>> I can't speak for the json dump, but an easier way of getting this information may be using the new version of the search API, which allows to facet by any field. In your case, it will be "license_id".
>>
>> For instance this call:
>>
>> curl -X POST -d @payload.json http://thedatahub.org/api/action/package_search
>>
>> Where payload.json contains:
>>
>> {
>>     "q":"*:*",
>>     "facet":"true",
>>     "facet.field":"license_id",
>>     "rows": 0
>> }
>>
>> Will return a list of all the different values for license id indexed (whether or not "official" licenses), see attached file.
>>
>> Note that on the next version of CKAN, due in a couple of weeks, these API will be able to be done via a GET request.
>>
>> Hope this helps,
>>
>> Adrià
>>
>>
>>
>>
>>
>> On 31 July 2012 13:21, Paul Miller <paul.miller at cloudofdata.com> wrote:
>>>
>>> Good afternoon
>>>
>>> I've been doing some work with the JSON dump at http://thedatahub.org/dump/. However, this is over a year old, and I'm now trying to get hold of a more current data set.
>>>
>>> Mark Wainwright at the Open Knowledge Foundation suggested that I ask here. So… does anyone have a more current JSON dump, or know of an easy way for me to get myself one?
>>>
>>> Many thanks
>>>
>>> Paul
>>>
>>> And, for those who want some background, or who can see a far better way to do what I'm trying to do… some background.
>>>
>>> I'm looking at the occurrence of different licenses in the Data Hub data. Using the CKAN api, it's easy to see a list of permissible licenses; http://thedatahub.org/api/1/rest/licenses
>>>
>>> It's then straightforward to step through the JSON dump, getting a count of occurrences of different values for license_id. In the year-old data set I've got, there are just over 2,000 records (half the number currently available in the Data Hub), and about a third of those have license values (mostly 'null', but also others like 'apache' and 'gpl-2.0' '3.0') that aren't part of the set of values reported from http://thedatahub.org/api/1/rest/licenses. It's therefore useful (to me) to be able to visually skim the file for these odd values, rather than simply querying the api for known terms…
>>>
>>> Any help gratefully received.
>>>
>>>
>>> Dr Paul Miller
>>> Cloud of Data
>>>
>>> cloudofdata.com/contact
>>>
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>>
>>
>> <results.json>_______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>>
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>
>
>
> --
> Mark Wainwright, CKAN Community Co-ordinator
> Open Knowledge Foundation http://okfn.org/
> CKAN on Twitter: @CKANproject
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>




More information about the ckan-dev mailing list