[okfn-discuss] British Library data JSON wrangling

Ben O'Steen bosteen at gmail.com
Tue Nov 8 11:05:41 UTC 2016


Sorry about that! The main request that I was trying to satisfy is the
demand for "everything". https://data.bl.uk is meant for all the items that
we cannot deliver through other online means or by shipping a harddrive.

The JSON structure links together a number of services that have no easy
process to gain machine-readable connections between them, mainly Flickr,
the BL catalog and the access portal system that you can (should be able
to) download the PDFs from.

I have loaded the JSON file into OpenRefine (and, incidentally, it can be
opened with python's json module. It was also created with this module as
well.)

If you have python installed, this script will created a flattened UTF-8
encoded CSV file from the json, with most fields included:

https://gist.github.com/benosteen/7dd20109bbdf7716218ba73279c70a3c

I can add the resultant CSV to the item record if that would be useful?


Ben


On 8 November 2016 at 10:00, Ian Ibbotson <ian.ibbotson at k-int.com> wrote:

> I don't know that this will help, but I think those resources are also
> loaded into Jisc historical texts at  https://historicaltexts.jisc.ac.uk
> -- for example https://historicaltexts.jisc.ac.uk/results?terms=A%
> 20Gossip%20about%20Old%20Manchester.%20With%20illustrations I think that
> there is an elasticsearch index underpinning the collections in JHT -- You
> don't say what you would like to extract from the data, but someone at JHT
> might be able to help? Might be worth dropping a line to the JHT enquiries
> address, YMMV tho.
>
> best,
> Ian.
>
> Ian Ibbotson
> Director
> Knowledge Integration Ltd
> 35 Paradise Street, Sheffield. S3 8PZ
> T: 0114 273 8271
> M: 07968 794 630
> W: http://www.k-int.com
> Doodle: http://doodle.com/ianibbo
>
> On 8 November 2016 at 08:41, John Levin <john at technolalia.org> wrote:
>
>> Dear list,
>>
>> The British Library has just launched
>> https://data.bl.uk/
>> with data sets including some 50,000 digitized books from 1510 to 1946.
>>
>> Infuriatingly, there isn't a simple manifest of these books. There is an
>> enormous (50mb) JSON file
>> https://data.bl.uk/digbks/db21.html
>> which I've been trying to wrangle with little success.
>>
>> What's the best way of getting information out of this blob? ANy help for
>> a JSON newbie?
>>
>> TIA
>>
>> John
>>
>> --
>> John Levin
>> http://www.anterotesis.com
>> http://twitter.com/anterotesis
>> _______________________________________________
>> okfn-discuss mailing list
>> okfn-discuss at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/okfn-discuss
>> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss
>>
>
>
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-discuss
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-discuss/attachments/20161108/175be222/attachment-0002.html>


More information about the okfn-discuss mailing list