[open-humanities] [humanities-dev] Visualizing English Word Origins

Tom Oinn tom.oinn at okfn.org
Fri May 4 10:50:55 UTC 2012


On 4 May 2012 10:30, Sam Leon <sam.leon at okfn.org> wrote:
> Hi Todd,
>
> (Cc-ing humanities discussion list as well)
>
> Thanks for this, really interesting. I especially like the pie charts
> summarising the proportion of words with a particular origin.
>
> In some future instance of TEXTUS an etymological class of annotations could
> be implemented.
>
> @Tom -- do you see any reason in principle why this wouldn't be possible?

Trivial really, I was thinking the other day we could do with a
'definition' annotation class - some of the example texts you
suggested have a lot of single word links to wikipedia or dictionary
articles, so building in the etymology to that kind of annotation
would make a lot of sense and be relatively simple.

The processing logic could easily be done by something which pulled
text from the textus API and wrote back annotations assuming there's a
script somewhere which can get the etymology given a word. That and /
or we could do it at import. Either way we'd then have the annotations
in the database and the charts should just be a simple db query.

So - suppose we have an annotation type which is intended to apply to
a single word or short phrase, what would it contain? I suspect what
we need are some link categories to dictionary definitions and to
encyclopaedic articles, to add the etymology would a single origin
property be sufficient (I'm thinking that anything more complex should
really be a link to somewhere else, but for stats gathering this would
be enough). So you might have the following annotation for the word
'frog' :

definition : {
  dict : [ "http://dictionary.reference.com/browse/frog?s=t" ],
  encyc : [ "http://en.wikipedia.org/wiki/Frog",
"http://www.britannica.com/EBchecked/topic/220611/frog" ],
  etym : "OE" }

Is there a sensible controlled vocabulary for languages including
archaic ones? They certainly exist for modern languages, I don't know
if there's a similar list which would cover the variations of Old
English, Norse etc.

Tom
-- 
Tom Oinn
+44 (0) 20 8123 5142 or Skype ID 'tomoinn'
http://www.crypticsquid.com




More information about the open-humanities mailing list