[okfn-labs] How to harvest scans from BL?
Enric Garcia Torrents
enricgarcia at uoc.edu
Fri Dec 20 21:00:12 UTC 2013
For what I see the Item Viewer image is made of a collection of jpgs. If I am not wrong, they have cut the pages into small sections. The problem is not that much scrapping those sections, but putting the pieces back together to reconstruct each page. As an example, here are several pieces of the 10th page of the book of your link:
http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,2048,1028,1028/pct:25/0/native.jpg
http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/1024,2048,1028,1028/pct:25/0/native.jpg
http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/1024,0,1028,1028/pct:25/0/native.jpg
http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/1024,1024,1028,1028/pct:25/0/native.jpg
http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,0,4112,4112/pct:6.25/0/native.jpg
http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,0,2056,2056/pct:12.5/0/native.jpg
http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,2048,2056,2056/pct:12.5/0/native.jpg
They seem to be using coordinates system to name the folders, so their item viewer can put them back together. It would take a little patience to figure out their system. Their algorithm should be replicated.
Best regards,
Enric G. Torrents
Email: e.g.cn at ieee.org
Tel.: +8613122141470
Skype: torrents.enric
cn.linkedin.com/in/enrictorrents/
--- Missatge original de Lars Aronsson per a okfn-labs at lists.okfn.org enviat el 20.12.2013 21:34
The British Library has scanned some books that they
let you download as PDFs. However, the PDFs are in
lower resolution that the scans that are displayed
in their online 'item viewer'.
Has anyone been able to harvest or scrape the full
resolution images from the item viewer?
Here is one such book,
http://explore.bl.uk/primo_library/libweb/action/search.do?vl%28freeText0%29=000507311&fn=search
Under "2 related resources" (red link at the right),
you will find two items where "I want this" for the
second one gives you a PDF or an 'item viewer'.
(Tell me if there is a short URL for this.)
Here's a sample from the PDF,
http://runeberg.org/elfsyssel/cow-pdf.png
The same sample from the item viewer,
http://runeberg.org/elfsyssel/cow-viewer.png
--
Lars Aronsson (lars at aronsson.se)
Project Runeberg - free Nordic literature - http://runeberg.org/
_______________________________________________
okfn-labs mailing list
okfn-labs at lists.okfn.org
https://lists.okfn.org/mailman/listinfo/okfn-labs
Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20131220/8e30634f/attachment-0004.html>
More information about the okfn-labs
mailing list