[okfn-labs] How to harvest scans from BL?

McGregor, Nora Nora.McGregor at bl.uk
Sat Dec 21 10:57:42 UTC 2013


If you're talking about all the scans from the Flickr Commons 1 Million Public domain release (http://www.businessinsider.com/the-british-librarys-important-flickr-plan-2013-12) and (http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html) Ben O'Steen can help get those to you. Probably best to contact him via twitter as he is on holiday @benosteen or email labs at bl.uk. More info here: http://blpublicdomain.wikispaces.com/ and the details of the files are here:  
https://github.com/BL-Labs/imagedirectory. We've got all the pdfs of the original books those images were taken from, ocr full text files and jpgs of the pages as well. 

Best,

Nora
@ndalyrose

-----Original Message-----
From: okfn-labs on behalf of Justin York
Sent: Fri 20/12/2013 9:22 PM
To: Enric Garcia Torrents
Cc: okfn-labs at lists.okfn.org
Subject: Re: [okfn-labs] How to harvest scans from BL?
 
They use Sanddragon <http://sanddragon.bl.uk/> which appears to the be BL's
own take on Microsoft's Deep Zoom. The code is open source so you should be
able to figure out how the API works for serving up images as Enric
described. Chances are there's a full resolution image in there somewhere
too.


On Fri, Dec 20, 2013 at 2:00 PM, Enric Garcia Torrents
<enricgarcia at uoc.edu>wrote:

>
> For what I see the Item Viewer image is made of a collection of jpgs. If I
> am not wrong, they have cut the pages into small sections. The problem is
> not that much scrapping those sections, but putting the pieces back
> together to reconstruct each page. As an example, here are several pieces
> of the 10th page of the book of your link:
>
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,2048,1028,1028/pct:25/0/native.jpg
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/1024,2048,1028,1028/pct:25/0/native.jpg
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/1024,0,1028,1028/pct:25/0/native.jpg
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/1024,1024,1028,1028/pct:25/0/native.jpg
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,0,4112,4112/pct:6.25/0/native.jpg
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,0,2056,2056/pct:12.5/0/native.jpg
>
> http://access.bl.uk/IIIFImageService/ark:/81055/vdc_000000011B9A.0x00000A/0,2048,2056,2056/pct:12.5/0/native.jpg
>
> They seem to be using coordinates system to name the folders, so their
> item viewer can put them back together. It would take a little patience to
> figure out their system. Their algorithm should be replicated.
>
>
> Best regards,
>
> Enric G. Torrents
> Email: e.g.cn at ieee.org
> Tel.: +8613122141470
> Skype: torrents.enric
>  <http://cn.linkedin.com/in/enrictorrents/>
> cn.linkedin.com/in/enrictorrents/
>
>
> --- Missatge original de Lars Aronsson <lars at aronsson.se> per a
> okfn-labs at lists.okfn.org enviat el 20.12.2013 21:34
>
> The British Library has scanned some books that they
> let you download as PDFs. However, the PDFs are in
> lower resolution that the scans that are displayed
> in their online 'item viewer'.
>
> Has anyone been able to harvest or scrape the full
> resolution images from the item viewer?
>
> Here is one such book,http://explore.bl.uk/primo_library/libweb/action/search.do?vl%28freeText0%29=000507311&fn=search
>
> Under "2 related resources" (red link at the right),
> you will find two items where "I want this" for the
> second one gives you a PDF or an 'item viewer'.
> (Tell me if there is a short URL for this.)
>
> Here's a sample from the PDF,http://runeberg.org/elfsyssel/cow-pdf.png
>
> The same sample from the item viewer,http://runeberg.org/elfsyssel/cow-viewer.png
>
>
> --
>    Lars Aronsson (lars at aronsson.se)
>    Project Runeberg - free Nordic literature - http://runeberg.org/
>
>
> _______________________________________________
> okfn-labs mailing listokfn-labs at lists.okfn.orghttps://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>
>
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: https://lists.okfn.org/mailman/options/okfn-labs
>
>




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20131221/1177354c/attachment-0004.html>


More information about the okfn-labs mailing list