[open-bibliography] [ol-discuss] Multivolume works

Lars Aronsson lars at aronsson.se
Thu May 3 23:16:58 UTC 2012

On 2012-05-04 00:51, Karen Coyle wrote:
> The difficulty seems to arise in the process of scanning. For the
> purposes of scanning, each physical volume becomes a scanned file.

At any serious scale (e.g. Google or Internet Archive), I think
book scanning needs to be organized as multiple work stations,
each taking their portion of a day's batch of books, meaning
that the 10 or 20 volumes of an encyclopedia will be scanned
by different people, each generating a job that goes through
OCR and postprocessing, so each volume needs its own metadata

However, with Google I often find volumes 2 and 5 being all that
is scanned. And at the Internet Archive I sometimes find everything
except volumes 2 and 7 has been scanned. So there is more chaos
than necessary.

When we're trying to use scanned books for reference and
for proofreading the text, we must hunt down individual parts
from different sources. The prime example must be the German
branch of Wikisource, here trying to find all 143 parts of the
Weimar edition (1887-1919) of Goethe's collected works,

Now, the structure shown on that wiki page is something that
should go into OpenLibrary.org, because it is open (as all
of Wikisource is free and open) bibliographic data.

   Lars Aronsson (lars at aronsson.se)
   Project Runeberg - free Nordic literature - http://runeberg.org/

More information about the open-bibliography mailing list