[ckan-dev] Looking for a recent JSON dump from the Data Hub
Paul Miller
paul.miller at cloudofdata.com
Tue Jul 31 18:06:19 UTC 2012
For anyone who is interested, my quick analysis of license usage in the Data Hub is now up at http://cloudofdata.com/2012/07/thinking-about-open-data-with-a-little-help-from-the-data-hub/.
Thanks again, Adrià, for the help…
Paul
Dr Paul Miller
Cloud of Data
cloudofdata.com/contact
On 31 Jul 2012, at 14:44, Paul Miller <paul.miller at cloudofdata.com> wrote:
> Adrià
>
> That gave me exactly what I needed; many thanks.
>
> Paul
>
> Dr Paul Miller
> Cloud of Data
>
> cloudofdata.com/contact
>
> On 31 Jul 2012, at 14:38, Adrià Mercader <amercadero at gmail.com> wrote:
>
>> Hi Paul,
>>
>> I can't speak for the json dump, but an easier way of getting this information may be using the new version of the search API, which allows to facet by any field. In your case, it will be "license_id".
>>
>> For instance this call:
>>
>> curl -X POST -d @payload.json http://thedatahub.org/api/action/package_search
>>
>> Where payload.json contains:
>>
>> {
>> "q":"*:*",
>> "facet":"true",
>> "facet.field":"license_id",
>> "rows": 0
>> }
>>
>> Will return a list of all the different values for license id indexed (whether or not "official" licenses), see attached file.
>>
>> Note that on the next version of CKAN, due in a couple of weeks, these API will be able to be done via a GET request.
>>
>> Hope this helps,
>>
>> Adrià
>>
>>
>>
>>
>>
>> On 31 July 2012 13:21, Paul Miller <paul.miller at cloudofdata.com> wrote:
>> Good afternoon
>>
>> I've been doing some work with the JSON dump at http://thedatahub.org/dump/. However, this is over a year old, and I'm now trying to get hold of a more current data set.
>>
>> Mark Wainwright at the Open Knowledge Foundation suggested that I ask here. So… does anyone have a more current JSON dump, or know of an easy way for me to get myself one?
>>
>> Many thanks
>>
>> Paul
>>
>> And, for those who want some background, or who can see a far better way to do what I'm trying to do… some background.
>>
>> I'm looking at the occurrence of different licenses in the Data Hub data. Using the CKAN api, it's easy to see a list of permissible licenses; http://thedatahub.org/api/1/rest/licenses
>>
>> It's then straightforward to step through the JSON dump, getting a count of occurrences of different values for license_id. In the year-old data set I've got, there are just over 2,000 records (half the number currently available in the Data Hub), and about a third of those have license values (mostly 'null', but also others like 'apache' and 'gpl-2.0' '3.0') that aren't part of the set of values reported from http://thedatahub.org/api/1/rest/licenses. It's therefore useful (to me) to be able to visually skim the file for these odd values, rather than simply querying the api for known terms…
>>
>> Any help gratefully received.
>>
>>
>> Dr Paul Miller
>> Cloud of Data
>>
>> cloudofdata.com/contact
>>
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>>
>> <results.json>_______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120731/5d243748/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4373 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120731/5d243748/attachment-0003.bin>
More information about the ckan-dev
mailing list