[ckan-dev] Looking for a recent JSON dump from the Data Hub
Paul Miller
paul.miller at cloudofdata.com
Wed Aug 1 10:03:16 UTC 2012
Thanks Mark :-)
I was specifically interested in the whole set rather than just the LD Cloud that Leigh looked at… The 'State of the LOD Cloud' work (http://www4.wiwiss.fu-berlin.de/lodcloud/state/#license) includes some analysis of LOD licenses, most recently updated in September last year. It would suggest that, as the LOD cloud continues to grow, the proportion of licensed data sets within it may actually be *falling*. That's clearly something we might wish to be concerned about.
Maybe Leigh (cc'd) could update his original piece? :-)
Paul
Dr Paul Miller
Cloud of Data
cloudofdata.com/contact
On 1 Aug 2012, at 10:11, Mark Wainwright <mark.wainwright at okfn.org> wrote:
> For anyone who is interested, my quick analysis of license usage in the Data Hub is now up at http://cloudofdata.com/2012/07/thinking-about-open-data-with-a-little-help-from-the-data-hub/.
>
> Great stuff Paul! I've tweeted it from @CKANproject.
>
> If you're want to make a more direct comparison with Leigh Dodds' earlier study on the Linked Data cloud, you can restrict your search to the Linking Open Data Cloud group on the DataHub (http://thedatahub.org/group/lodcloud), home of the LOD Cloud. How you amend the cURL data to get that working I don't know but if you're interested I'm sure someone else will.
>
> Mark
>
>
> Thanks again, Adrià, for the help…
>
> Paul
>
> Dr Paul Miller
> Cloud of Data
>
> cloudofdata.com/contact
>
> On 31 Jul 2012, at 14:44, Paul Miller <paul.miller at cloudofdata.com> wrote:
>
>> Adrià
>>
>> That gave me exactly what I needed; many thanks.
>>
>> Paul
>>
>> Dr Paul Miller
>> Cloud of Data
>>
>> cloudofdata.com/contact
>>
>> On 31 Jul 2012, at 14:38, Adrià Mercader <amercadero at gmail.com> wrote:
>>
>>> Hi Paul,
>>>
>>> I can't speak for the json dump, but an easier way of getting this information may be using the new version of the search API, which allows to facet by any field. In your case, it will be "license_id".
>>>
>>> For instance this call:
>>>
>>> curl -X POST -d @payload.json http://thedatahub.org/api/action/package_search
>>>
>>> Where payload.json contains:
>>>
>>> {
>>> "q":"*:*",
>>> "facet":"true",
>>> "facet.field":"license_id",
>>> "rows": 0
>>> }
>>>
>>> Will return a list of all the different values for license id indexed (whether or not "official" licenses), see attached file.
>>>
>>> Note that on the next version of CKAN, due in a couple of weeks, these API will be able to be done via a GET request.
>>>
>>> Hope this helps,
>>>
>>> Adrià
>>>
>>>
>>>
>>>
>>>
>>> On 31 July 2012 13:21, Paul Miller <paul.miller at cloudofdata.com> wrote:
>>> Good afternoon
>>>
>>> I've been doing some work with the JSON dump at http://thedatahub.org/dump/. However, this is over a year old, and I'm now trying to get hold of a more current data set.
>>>
>>> Mark Wainwright at the Open Knowledge Foundation suggested that I ask here. So… does anyone have a more current JSON dump, or know of an easy way for me to get myself one?
>>>
>>> Many thanks
>>>
>>> Paul
>>>
>>> And, for those who want some background, or who can see a far better way to do what I'm trying to do… some background.
>>>
>>> I'm looking at the occurrence of different licenses in the Data Hub data. Using the CKAN api, it's easy to see a list of permissible licenses; http://thedatahub.org/api/1/rest/licenses
>>>
>>> It's then straightforward to step through the JSON dump, getting a count of occurrences of different values for license_id. In the year-old data set I've got, there are just over 2,000 records (half the number currently available in the Data Hub), and about a third of those have license values (mostly 'null', but also others like 'apache' and 'gpl-2.0' '3.0') that aren't part of the set of values reported from http://thedatahub.org/api/1/rest/licenses. It's therefore useful (to me) to be able to visually skim the file for these odd values, rather than simply querying the api for known terms…
>>>
>>> Any help gratefully received.
>>>
>>>
>>> Dr Paul Miller
>>> Cloud of Data
>>>
>>> cloudofdata.com/contact
>>>
>>>
>>> _______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>>
>>>
>>> <results.json>_______________________________________________
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/ckan-dev
>>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
>
>
>
>
> --
> Mark Wainwright, CKAN Community Co-ordinator
> Open Knowledge Foundation http://okfn.org/
> CKAN on Twitter: @CKANproject
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/ckan-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120801/33d49840/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4373 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20120801/33d49840/attachment-0003.bin>
More information about the ckan-dev
mailing list