[open-linguistics] Metrics in linguistics
Christian Chiarcos
chiarcos at informatik.uni-frankfurt.de
Sat May 7 11:47:12 UTC 2016
Dear Bahman,
that crucially depends on the problem you're interested in.
Some coarse-grained metrics:
- For approximating phonological complexity, you can just count the number
of phonemes (e.g., using http://phoible.org). But this gives an incomplete
picture only as it doesn't quantify phonological processes.
- For approximating morphological complexity, the easiest thing would be
to count the number of morphemes a language possesses. But that requires a
morphological parser or a formalized grammar fragment. If neither is
available, but you have larger amounts of text, you can use an
unsupervised morph segmenter, e.g., Morfessor
(http://www.cis.hut.fi/projects/morpho/). This gives an incomplete picture
as it doesn't quantify complex morphophonological processes (e.g.,
assimilation) and that it doesn't distinguish morphemes and morphs.
- For approximating syntactic complexity, you need to define what you
mean. Simply counting the average number of words per sentence might give
you an idea, but that doesn't actually assess syntax. More heavy-weight
metrics exist, e.g., Hawkin's syntactic weight,* but they normally operate
on phrases or (at most) sentences, not languages, so you'd need to
average over a corpus, again -- and they're specific to the grammatical
framework you use for parsing.
* Hawkins (1994, see
http://www.cambridge.org/us/academic/subjects/languages-linguistics/grammar-and-syntax/performance-theory-order-and-constituency)
These are crude (but cheap) metrics sufficient for people with a focus on
number crunching. Nothing that a philologist would be likely to accept,
though. Everything beyond this is broadly in the realm of linguistic
typology, so there is an entire sub-discipline of science working on the
problem. If you want to go this way, you may ask this question under
LINGTYP at listserv.linguistlist.org, again.
Best,
Christian
Am .05.2016, 15:10 Uhr, schrieb Bahman Jabbarian Amiri
<jabbarian at ut.ac.ir>:
> Dear Folks
> As a person whose expertise is out of linguistics I was curious whether
> there are some kinds of metrics in linguistics by which one >might be
> able to measure and then compare different languages in view point of
> structure, composition, configuration and >complexity.
> I would appreciate it if anyone might suggest paper or any reading
> sources on this matter.
>
> Kind regards
> Bahman J. Amiri
--
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany
office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20160507/f3b5702e/attachment-0003.html>
More information about the open-linguistics
mailing list