Crowdsourcing History:
Collaborative Online Transcription and Archives
Session Abstract

Large-scale digitization of manuscript materials has recently made new
volumes of primary sources available to a global audience via the
Internet. However, transcribing these materials to enable searches or
other kinds of algorithmic processing poses a significant challenge in
terms of labor required. Scholars and cultural heritage institutions
are increasingly exploring the use of collaborative online approaches
(also called “crowdsourcing”) as a way to address these challenges.

This session seeks to explore the potential and pitfalls of
crowdsourcing as a method for collecting transcriptions and for
teaching wider audiences about reading and using historical
manuscripts. We seek to bring together public historians, archivists,
academic historians, technologists, and other scholars to learn about
and discuss these projects and the future of crowdsourcing in the
historical profession.


Here are the full details for our presentations, only some of which we
were able to submit to the Program Committee. (I’ve omitted the
250-word biographical statements for presenters, but you can find most
of that information at the websites linked below.)

Crowdsourcing Transcription of the Papers of the War Department using Scripto
Sharon Leon, Center for History and New Media

In 1800, the United States War Department burned to the ground. The
important materials of that archive were lost to historians until an
intrepid group for researchers began the mission to reconstitute the
papers by collecting and scanning received copies and materials from
other archives. Today, those images, representing nearly 55,000
documents, are available to researchers from the Papers of the War
Department, 1784-1800 project website. In keeping with the origins of
this non-traditional digital archive, PWD is embarking on a new
venture to revolutionize the work of documentary editors by opening
the archive up for crowdsourcing of transcription.

This innovation in editing practice is facilitated by the use of the
Center for History and New Media’s newest open source tool: Scripto.
Scripto allows users to contribute transcriptions to online
documentary projects. The tool includes a versioning history and full
set of editorial controls, so that project staff can manage public
contributions. The crowdsourcing work with PWD serves as a case study
for other documentary projects that might want to pursue similar
methods for beginning transcription, measurably improving their search
corpus, and creating a vibrant community of users among scholarly
researchers, students and teacher, and members of the general public.
CHNM will capture the lessons learned with PWD in the form of a guide
for editors, and will share those lessons with the audience of the
AHA’s 2012 annual meeting.
Transcribing Jeremy Bentham
Valerie Wallace, The Bentham Project, University College London

In his will Jeremy Bentham, the great philosopher and reformer who
lived from 1748 until 1832, requested that after his death his body be
preserved in a box and put on display. He suggested that an
accompaniment to this ‘auto-icon’ might be his ‘unedited and
unfinished manuscripts, lodged in an appropriate case of shelves’.
Bentham would have approved therefore of the Transcribe Bentham
initiative, a project whose aim is to digitise these uedited and
unfinished papers, of which there are 60,000, and put them on what is
arguably an appropriate case of shelves for the twenty-first century:
the internet.

Transcribe Bentham is run by the Bentham Project in the Faculty of
Laws at University College London in colloboration with UCL Centre for
Digital Humanities. The Bentham Project is responsible for the
publication of the Collected Works of Jeremy Bentham, an authoritative
edition of the philosopher’s writings based on the original manuscript
papers. The aim of the Transcribe Bentham initiative is to digitise
and crowdsource the transcription of these manuscripts. The Transcribe
Bentham team has designed a Transcription Desk using MediaWiki where
users can log-in, view, and transcribe Bentham’s papers, encoding
their transcripts in TEI-compliant XML. The project aims to digitise
at least 12,500 manuscripts in a year. This presentation will discuss
the project’s experience of crowdsourcing and the quantitative and
qualitative data generated by the initiative, offering thoughts on the
future of collaborative manuscript transcription and the impact of
crowdsourcing on an academic editorial project.
T-PEN — Transcription for Paleographical and Editorial Notation
James Ginther, Center for Digital Theology, Saint Louis University

T-PEN (Transcription for Paleographical and Editorial Notation) is a
digital tool for scholars who use digital images of unpublished
manuscripts that are housed in digital repositories throughout the
world. T-PEN will provide a fully-equipped digital workspace in which
the scholar — while constantly viewing the manuscript images —
transcribes line by line, makes notes about problematic paleographic
features, documents glosses and corrections or revisions to the
manuscript, and may—either during transcription or after further
research—add interpretative or bibliographic information pertaining to
particular lines or larger sections of the text. With this tool, the
transcribed text can also be immediately encoded with XML markup to
indicate any given feature of the text (e.g., a rubric, colophon,
gloss, lemma, correction, quire signature, citation, etc.). T-PEN will
be ready for release as an open-source web-based application by April
2012 and will be in beta testing at the time of the AHA conference in
January 2012.
Invisible Australians: Living under the White Australia Policy
Kate Bagnall and Tim Sherratt

Invisible Australians aims to reveal something of the lives of the
thousands of men, women and children who were affected by the
racially-based immigration policy of early 20th-century Australia. To
administer the Immigration Restriction Act, government officials
implemented an increasingly complex and structured system of tracking
and documenting the movements of non-white people as they travelled in
and out of the country. This surveillance left an extraordinary body
of records containing information about people who, according to the
national myth of a ‘White Australia’, were not Australian at all.
Using crowdsourced transcription, our project intends to extract
biographical data from these records, piece together these fragments
of identity and work towards revealing the real face of White
Crowdsourcing access to women’s history in Western Australia
Jennifer Griffiths

The project I hope to run aims to use crowdsourcing techniques on the
resources of the Western Australian State Library, State Records
Office and Museum to access women’s stories in the records in order to
support the heritage industry in improving the representation of women
in the State Heritage Register. Currently, women are seriously
under-represented in both the Register and the historical research of
heritage in WA. This impacts significantly on the community’s
understanding women’s lives in WA in the past. This in turn has
ramifications for the way contemporary women are represented and
valued. While the project on its own is unlikely to change community
perceptions of women’s pasts, it will be part of a network of actions
that will achieve this. The project also aims to introduce history and
heritage professionals in WA to feminist history practices (how
records are read with a feminist lens) and produce a database of
women’s stories and histories that will be important to the future
study of women in WA. In addition, the project will allow
professionals to participate in a crowdsourcing project thus
introducing them to using technology in new ways for research.
User participation and collaborative creativity
Alexandra Eveleigh, Department of Information Studies, University College London

My research looks at the impact of user participation and
‘collaborative creativity’ upon archival theory and practice, with a
particular focus on users’ involvement in archival description and
metadata creation/reuse. It is funded by a UK Arts and Humanities
Research Council collaborative doctoral award, the partners being
University College London and The National Archives.

My working research questions are:
# Is user participation an evolution or revolution in archival
practice & professionalism?
# What contexts and circumstances encourage and motivate users to
participate in archival description?
# What impact do participatory methodologies have upon (a) the archive
service (b) existing users (c) new users and broader society?

The objectives are essentially to distinguish between what works and
what doesn’t, and why: to explore some of the realities behind the
claims made regarding experts, crowds and volunteer communities, and
seek to understand what moves to allow a multiplicity of voices to
supplement or even supplant the authoritative professional voice might
mean for notions of archival value and traditional communities of
archive users.
Linked Data, Transcription, and Markup for Archives and Communities
Abigail Belfrage, Public Record Office Victoria, http://www.prov.vic.gov.au

Public Record Office Victoria (PROV) is the archival authority for the
State of Victoria, Australia. In partnership with a number of research
and community-based organisations PROV is developing an open-source,
web-based crowdsourcing transcription and semantic (& geo-location)
mark-up app. The aim is not just to create a valuable body of linked
data and images from the state’s archives, but to enable access to the
transcription & markup functionality for communities to use on their
own projects.
Crowdsourcing Historical Climate Data and Papyrus Transcriptions
Chris Lintott, Citizen Science Alliance

Chris Lintott is the chair of the Citizen Science Alliance which
builds and operates the Zooniverse network of online citizen science
projects which grew from Galaxy Zoo, which invited participants to
classify a million galaxies. More than 350,000 people have taken part
in Zooniverse projects, which include Old Weather – which transcribes
historical and climate data from ship’s logs – and a project to
transcribe the Oxyrhynchus papyri.
FromThePage: a web-based tool for transcribing, indexing, and
annotating handwritten material
Ben Brumfield, software engineer, Beta.FromThePage.com

Ben Brumfield is a software engineer in Austin, Texas with more than a
dozen years of experience developing web-based, database-driven
software. Since 2005 he has been building FromThePage, a web-based
tool for transcribing, indexing, and annotating handwritten material.
This tool has been used to transcribe over 1500 pages of family
diaries and is now being used by the San Diego Natural History Museum
to transcribe and analyze naturalists’ field notes from the early 20th
century. In 2010, FromThePage was released under a Free/Open Source
license. Brumfield blogs about manuscript transcription technology at

