[open-linguistics] Metrics in linguistics

Christian Chiarcos chiarcos at informatik.uni-frankfurt.de
Sat May 7 11:47:12 UTC 2016


Dear Bahman,

that crucially depends on the problem you're interested in.

Some coarse-grained metrics:

- For approximating phonological complexity, you can just count the number  
of phonemes (e.g., using http://phoible.org). But this gives an incomplete  
picture only as it doesn't quantify phonological processes.

- For approximating morphological complexity, the easiest thing would be  
to count the number of morphemes a language possesses. But that requires a  
morphological parser or a formalized grammar fragment. If neither is  
available, but you have larger amounts of text, you can use an  
unsupervised morph segmenter, e.g., Morfessor  
(http://www.cis.hut.fi/projects/morpho/). This gives an incomplete picture  
as it doesn't quantify complex morphophonological processes (e.g.,  
assimilation) and that it doesn't distinguish morphemes and morphs.

- For approximating syntactic complexity, you need to define what you  
mean. Simply counting the average number of words per sentence might give  
you an idea, but that doesn't actually assess syntax. More heavy-weight  
metrics exist, e.g., Hawkin's syntactic weight,* but they normally operate  
on phrases or (at most) sentences, not  languages, so you'd need to  
average over a corpus, again -- and they're specific to the grammatical  
framework you use for parsing.
*   Hawkins (1994, see  
http://www.cambridge.org/us/academic/subjects/languages-linguistics/grammar-and-syntax/performance-theory-order-and-constituency)

These are crude (but cheap) metrics sufficient for people with a focus on  
number crunching. Nothing that a philologist would be likely to accept,  
though. Everything beyond this is broadly in the realm of linguistic  
typology, so there is an entire sub-discipline of science working on the  
problem. If you want to go this way, you may ask this question under  
LINGTYP at listserv.linguistlist.org, again.

Best,
Christian

Am .05.2016, 15:10 Uhr, schrieb Bahman Jabbarian Amiri  
<jabbarian at ut.ac.ir>:

> Dear Folks
> As a person whose expertise is out of linguistics I was curious whether  
> there are some kinds of metrics in linguistics by which one >might be  
> able to measure and then compare different languages in view point of  
> structure, composition, configuration and >complexity.
> I would appreciate it if anyone might suggest paper or any reading  
> sources on this matter.
>
> Kind regards
> Bahman J. Amiri
-- 
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20160507/f3b5702e/attachment-0003.html>


More information about the open-linguistics mailing list