[open-humanities] Annotation Data Dump

James Harriman-Smith james.harriman-smith at cantab.net
Thu Dec 1 13:43:36 UTC 2011


Andrew Magliozzi has made the full XML database of Harvard annotations
available for download here:

@ Nick, @ Mark: any thoughts?

Readme and Andrew's original message below.

Have a good weekend,


---------- Forwarded message ----------
From: Andrew Magliozzi <andrew.magliozzi at gmail.com>
Date: 28 November 2011 19:20
Subject: Annotation Data
To: rufus.pollock at okfn.org
Cc: James Harriman-Smith <james.harriman-smith at cantab.net>

Dear Rufus and James,

Thanks for your patience as we took some time off for Thanksgiving weekend
hear in the US.  I hope all is well in the UK and I still look forward to
meeting you if/when you arrive in Boston.  Without ado, here is a large sql
database of all of our books and annotations (in HTML format):


As I mentioned, there are a few gotchas and little details to consider when
trying to parse our data.  Here's a brief run-down of the specific things
to look for:

1) Images are included in some (but not many) annotations.  Macbeth and the
Divine Comedy are two specific books that come to mind for this.

2) Some annotations are linked directly to others (via a permalink for each
annotation).  Dante's Divine Comedy has most of these, though some
Shakespeare plays do as well.

3) There may also be some other metadata, such as "annotation type."  I
believe we had a few tags, such as "potential spoiler," "historical
context," and "close reading."  I think there were five categories in
total, but I can't recall the others off the top of my head.

Last but not least, I would like to maintain the original finalsclub
usernames for authorship of each annotation.  In order to properly cite
FinalsClub.org, could you also place a link to our site with proper
creative commons attribution in the footer of your site for each of the
documents for which we provide annotations?  I'm flexible with how we do
this, but it's good to get this discussion started now.

Thanks again for helping us get this knowledge out to the world.  I look
forward to working with you in the near and long term.  And of course,
please let me know if you have any questions, comments, or concerns.

Andrew Magliozzi
Founder, FinalsClub.org

PS - We'll be working to get you a more structured version of our data, but
this format should suffice to get the project started.

James Harriman-Smith
Lecteur d'anglais
ENS de Lyon
Bureau F323
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-humanities/attachments/20111201/94aa8d5d/attachment.html>

More information about the open-humanities mailing list