[okfn-labs] How to harvest scans from BL?

Tom Morris tfmorris at gmail.com
Thu Jan 28 20:57:02 UTC 2016


This thread is a couple of years old, but I just started working with the
British Library 19th Century Books digital collection for the Git-Lit
project <https://github.com/Git-Lit/git-lit>.  One of the first things I've
been working on is cleaning and enriching the metadata which may be of
interest to anyone else working with it.

Lars - If you're interested, I can provide you with a list of a few hundred
volumes in Nordic languages.  One of the things that I did was run a pair
of language detectors over all the title data, so I've got a reasonable
guess at a language even for those editions which weren't coded in the MARC
record (plus found a couple hundred coded with the wrong language).

Sadly, although the British Library is mounting a big multi-stop road show
to try to get people to make more use of their digital collections, they
don't actually answer their email, so I'm kind of patching this together by
trial and error, but hopefully there will be a much more useful set of
metadata at the end of it.

Tom

On Fri, Dec 20, 2013 at 5:03 PM, Lars Aronsson <lars at aronsson.se> wrote:

> On 12/20/2013 10:00 PM, Enric Garcia Torrents wrote:
>
>>
>> For what I see the Item Viewer image is made of a collection of jpgs. If
>> I am not wrong, they have cut the pages into small sections. The problem is
>> not that much scrapping those sections, but putting the pieces back
>> together to reconstruct each page. As an example, here are several pieces
>> of the 10th page of the book of your link:
>>
>>
>> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,2048,1028,1028/pct:25/0/native.jpg
>>
>
> Thanks a lot! Now it's easy. Just order one huge tile,
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,0,3028,3028/pct:100/0/native.jpg
>
> The resulting image is the 1807 x 2764 pixel image,
> not the 3028 x 3028 section that is requested.
>
> At 300 dpi, that is 6 x 9 inches, the full book page.
>
> The last /0/ is apparently rotation, where /90/ returns
> an image rotated 90 degrees clockwise.
>
>
>
> --
>   Lars Aronsson (lars at aronsson.se)
>   Project Runeberg - free Nordic literature - http://runeberg.org/
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20160128/0356ba5a/attachment-0003.html>


More information about the okfn-labs mailing list