[open-literature] The 'Northwestern' Shakespeare: New Texts for Open Shakespeare

James Harriman-Smith jam3s.h.s at gmail.com
Fri Mar 19 10:52:43 UTC 2010

Hi everyone,

We got an email the other day from a Professor at Northwestern University
offering us the use of an online edition of Shakespeare that they have
prepared, and of a much a higher standard than the Moby Shakespeare. (The
correspondence is appended below).

Obviously we would love to use this, but there are one or two questions.
First (and I profess complete ignorance on this count), the Northwestern
texts are coded in TEI XML, and I'm wondering if this may cause problems
with our annotation tool? Or will just the fact of changing our base texts
for annotation cause considerable problems when it comes to transferring the
glosses that have already been made?

Secondly, the Northwestern texts are copyrighted (again, see below) and this
may conflict with our Open licence. The professor has said that he doesn't
mind their distribution for non-profit use, but would want a cut of any
future profits since he and others have invested a lot of time in
establishing these texts.

Anyway, what does everyone think? All the details are appended below,



The WordHoard Shakespeare is a joint project of the Perseus Project at Tufts
The Northwestern University Library, and Northwestern University Academic
Technologies. It is derived from The Globe Shakespeare, the one-volume
version of the
Cambridge Shakespeare, edited by W. G. Clark, J. Glover, and W. A. Wright
(1891-3). The
Internet Shakespeare editions of the quartos and folios have been consulted
to create a
modern text that observes as closely as possible the morphological and
practices of the earliest editions. Spellings, especially of contracted and
forms, have been standardized across the corpus. The text has been fully
and morphosyntactically tagged.

© 2003. The copyright to The WordHoard Shakespeare is owned jointly by
University and Tufts University. The WordHoard Shakespeare is provided for
free solely
for non-commercial use by students, scholars, and the public. Any commercial
use or
publication of it, in whole or in part, without prior written authorization
of the
copyright holders is strictly prohibited


Dear James Harriman Smith and colleagues,

I looked at your Open Shakespeare and found much to like. I also have
some things to offer. At Northwestern University we prepared a
Shakespeare text that exists in two versions. One is linguistically
annotated, that is to say every word is associated with a lemma and
part of speech. The other is a plain XML file encoded in TEI.

You can see what we have done with these texts in wordhoard at
http://wordhoard.northwestern.edu. The texts are in the public domain
and you can download them from http://monkproject.org/downloads/

This Shakespeare text is a lot better than the MOBY text on several
fronts. It is derived from a digital scan of the Globe Shaskespeare
that went through several rounds of curation in the Perseus Project. I
worked on it and checked ~ 5,000 passages where the Folio and Quarto
versions differed. I looked at the various standard modern
Shakespeares that are descended from the Globe text (Riverside, Arden,
Bevington) with the aim of establishing a modern eclectic consensus

So the texts are better than Moby.  The encoding is more granular.
Poetry and verse are distinguished. Speakers are identified by name
and sex. There is other stuff.

I'd be interested in getting this text more widely used, and if you're
interested in using it in your project, I'll be happy to provide
advice on this and that, although I don't have a lot of time to give
to this project.

Martin Mueller
Professor of English and Classics


Dear James,

I'm not a lawyer, so all this copyright stuff tends to confuse me. I think
that both Tufts and Northwestern, the copyright holders, are OK with any use
where no money changes hands. If money does change hands, we would want to
know, and if it's a profit making enterprise we would remember that a lot of
labor went into this and that we might want to recoup some of this.

Whether any of these contingencies would ever come up is another matter. I
doubt it, and I wouldn't want some remote contingency standing in the way of
wider use of the WordHoard Shakespeare. But on all this copyright stuff we
have two players involved, and I would need to check with them about what is
OK or not.

The WordHoard Shakespeare is encoded in TEI-XML. if Your annotation tool
can't handle that you should probably work on it, because of the majority of
academically respectable and free texts are encoded in some flavour of it.

I want to make this work. In the meantime, feel free to share my letter with


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-literature/attachments/20100319/4951b492/attachment.html>

More information about the open-literature mailing list