[okfn-discuss] British Library data JSON wrangling

John Levin john at technolalia.org
Wed Nov 9 11:14:24 UTC 2016

Works like a charm! Thank you *very* much!

Strongly suggest you add the csv file to data.bl.uk - it means people 
can get a decent overview of what's contained within, and do simple 
searches for material.

Thanks once again,


On 08/11/2016 11:05, Ben O'Steen wrote:
> Sorry about that! The main request that I was trying to satisfy is the
> demand for "everything". https://data.bl.uk is meant for all the items
> that we cannot deliver through other online means or by shipping a
> harddrive.
> The JSON structure links together a number of services that have no easy
> process to gain machine-readable connections between them, mainly
> Flickr, the BL catalog and the access portal system that you can (should
> be able to) download the PDFs from.
> I have loaded the JSON file into OpenRefine (and, incidentally, it can
> be opened with python's json module. It was also created with this
> module as well.)
> If you have python installed, this script will created a flattened UTF-8
> encoded CSV file from the json, with most fields included:
> https://gist.github.com/benosteen/7dd20109bbdf7716218ba73279c70a3c
> I can add the resultant CSV to the item record if that would be useful?
> Ben
> On 8 November 2016 at 10:00, Ian Ibbotson <ian.ibbotson at k-int.com
> <mailto:ian.ibbotson at k-int.com>> wrote:
>     I don't know that this will help, but I think those resources are
>     also loaded into Jisc historical texts
>     at  https://historicaltexts.jisc.ac.uk
>     <https://historicaltexts.jisc.ac.uk> -- for example
>     https://historicaltexts.jisc.ac.uk/results?terms=A%20Gossip%20about%20Old%20Manchester.%20With%20illustrations
>     <https://historicaltexts.jisc.ac.uk/results?terms=A%20Gossip%20about%20Old%20Manchester.%20With%20illustrations>
>     I think that there is an elasticsearch index underpinning the
>     collections in JHT -- You don't say what you would like to extract
>     from the data, but someone at JHT might be able to help? Might be
>     worth dropping a line to the JHT enquiries address, YMMV tho.
>     best,
>     Ian.
>     Ian Ibbotson
>     Director
>     Knowledge Integration Ltd
>     35 Paradise Street, Sheffield. S3 8PZ
>     T: 0114 273 8271
>     M: 07968 794 630
>     W: http://www.k-int.com
>     Doodle: http://doodle.com/ianibbo <http://doodle.com/ianibbo>
>     On 8 November 2016 at 08:41, John Levin <john at technolalia.org
>     <mailto:john at technolalia.org>> wrote:
>         Dear list,
>         The British Library has just launched
>         https://data.bl.uk/
>         with data sets including some 50,000 digitized books from 1510
>         to 1946.
>         Infuriatingly, there isn't a simple manifest of these books.
>         There is an enormous (50mb) JSON file
>         https://data.bl.uk/digbks/db21.html
>         <https://data.bl.uk/digbks/db21.html>
>         which I've been trying to wrangle with little success.
>         What's the best way of getting information out of this blob? ANy
>         help for a JSON newbie?
>         TIA
>         John

John Levin

More information about the okfn-discuss mailing list