[humanities-dev] Visualizing English Word Origins
Tom Oinn
tom.oinn at okfn.org
Fri May 4 10:50:55 UTC 2012
On 4 May 2012 10:30, Sam Leon <sam.leon at okfn.org> wrote:
> Hi Todd,
>
> (Cc-ing humanities discussion list as well)
>
> Thanks for this, really interesting. I especially like the pie charts
> summarising the proportion of words with a particular origin.
>
> In some future instance of TEXTUS an etymological class of annotations could
> be implemented.
>
> @Tom -- do you see any reason in principle why this wouldn't be possible?
Trivial really, I was thinking the other day we could do with a
'definition' annotation class - some of the example texts you
suggested have a lot of single word links to wikipedia or dictionary
articles, so building in the etymology to that kind of annotation
would make a lot of sense and be relatively simple.
The processing logic could easily be done by something which pulled
text from the textus API and wrote back annotations assuming there's a
script somewhere which can get the etymology given a word. That and /
or we could do it at import. Either way we'd then have the annotations
in the database and the charts should just be a simple db query.
So - suppose we have an annotation type which is intended to apply to
a single word or short phrase, what would it contain? I suspect what
we need are some link categories to dictionary definitions and to
encyclopaedic articles, to add the etymology would a single origin
property be sufficient (I'm thinking that anything more complex should
really be a link to somewhere else, but for stats gathering this would
be enough). So you might have the following annotation for the word
'frog' :
definition : {
dict : [ "http://dictionary.reference.com/browse/frog?s=t" ],
encyc : [ "http://en.wikipedia.org/wiki/Frog",
"http://www.britannica.com/EBchecked/topic/220611/frog" ],
etym : "OE" }
Is there a sensible controlled vocabulary for languages including
archaic ones? They certainly exist for modern languages, I don't know
if there's a similar list which would cover the variations of Old
English, Norse etc.
Tom
--
Tom Oinn
+44 (0) 20 8123 5142 or Skype ID 'tomoinn'
http://www.crypticsquid.com
More information about the humanities-dev
mailing list