[open-linguistics] Categories for data in the LLOD Cloud Diagram

Dave Lewis dave.lewis at cs.tcd.ie
Fri Jul 12 14:48:43 UTC 2013


Hi Hugh, all,

> True :: A published journal paper discussing a grammatical feature of 
> a minority language would be typed as a bitext.
>
>

I'd presume this would not be true as bi-text, as usually this denotes a 
set of aligned pairs of source and translation sentences, phrases or 
words that are the outcome of some translation process.

It raises an interesting question of whether bi-text should be a 
classification by itself, or is a characterization of the _link_ between 
two monolingual resources. The latter would be a bit more in-line with 
how ELRA characteristes resources, (e.g. 
http://catalog.elra.info/index.php?language=en ). They supporting both 
monolingual and multilingual version of lexica, corpora and terminology  
- though not speech and multimodal/multimedia resoruces.

Also, for multilingual resources the tag 'bitext' might be a bit 
misleading, as there could be links in multilingual corpora from source 
text to translation in more than one other language.

Another question is the classification of comparable text, i.e. text in 
two languages that wouldn't yield a clean bi-text alignment as it does 
not result from a sentence by sentence translation process, e.g. 
wikipedia pages on the same topic authored in different languages, or 
transcreation of marketting material

cheers,
Dave

> On Jul 11, 2013, at 11:13 AM, John McCrae wrote:
>
>> Hi all,
>>
>> It was discussed today generating categories on the current LLOD 
>> diagram as here
>>
>> https://raw.github.com/jmccrae/llod-cloud.py/master/llod-cloud.july2013.png
>>
>> The proposal is that we should divide language resources into 6 broad 
>> categories
>>
>>   * Terminology and lexicon resources (tag: /lexical/)
>>       o e.g., Wiktionary derived resources
>>   * Typological Databases (tag: /typological/)
>>       o e.g., WALS
>>   * Translation Memories and Bitext (tag: /bitext/)
>>       o e.g., JRC Names
>>   * Annotated Corpora (tag: /annotated-corpus/)
>>       o e.g., Alpino
>>   * Multimodal resources (tag: /multimodal-corpus/)
>>       o Not sure if we have any examples as of yet
>>   * Metadata and linguistic categories (tag: /linguistic-metadata/)
>>       o e.g., ISOcat
>>
>> Does this seems like a sufficient division that would clarify the 
>> relative spread of the LLOD data, and does anyone have any other 
>> general comments?
>>
>> Regards,
>> John
>> _______________________________________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org <mailto:open-linguistics at lists.okfn.org>
>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics
>
>
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20130712/a325e168/attachment-0001.html>


More information about the open-linguistics mailing list