[open-linguistics] Categories for data in the LLOD Cloud Diagram
Dave Lewis
dave.lewis at cs.tcd.ie
Fri Jul 12 14:48:43 UTC 2013
Hi Hugh, all,
> True :: A published journal paper discussing a grammatical feature of
> a minority language would be typed as a bitext.
>
>
I'd presume this would not be true as bi-text, as usually this denotes a
set of aligned pairs of source and translation sentences, phrases or
words that are the outcome of some translation process.
It raises an interesting question of whether bi-text should be a
classification by itself, or is a characterization of the _link_ between
two monolingual resources. The latter would be a bit more in-line with
how ELRA characteristes resources, (e.g.
http://catalog.elra.info/index.php?language=en ). They supporting both
monolingual and multilingual version of lexica, corpora and terminology
- though not speech and multimodal/multimedia resoruces.
Also, for multilingual resources the tag 'bitext' might be a bit
misleading, as there could be links in multilingual corpora from source
text to translation in more than one other language.
Another question is the classification of comparable text, i.e. text in
two languages that wouldn't yield a clean bi-text alignment as it does
not result from a sentence by sentence translation process, e.g.
wikipedia pages on the same topic authored in different languages, or
transcreation of marketting material
cheers,
Dave
> On Jul 11, 2013, at 11:13 AM, John McCrae wrote:
>
>> Hi all,
>>
>> It was discussed today generating categories on the current LLOD
>> diagram as here
>>
>> https://raw.github.com/jmccrae/llod-cloud.py/master/llod-cloud.july2013.png
>>
>> The proposal is that we should divide language resources into 6 broad
>> categories
>>
>> * Terminology and lexicon resources (tag: /lexical/)
>> o e.g., Wiktionary derived resources
>> * Typological Databases (tag: /typological/)
>> o e.g., WALS
>> * Translation Memories and Bitext (tag: /bitext/)
>> o e.g., JRC Names
>> * Annotated Corpora (tag: /annotated-corpus/)
>> o e.g., Alpino
>> * Multimodal resources (tag: /multimodal-corpus/)
>> o Not sure if we have any examples as of yet
>> * Metadata and linguistic categories (tag: /linguistic-metadata/)
>> o e.g., ISOcat
>>
>> Does this seems like a sufficient division that would clarify the
>> relative spread of the LLOD data, and does anyone have any other
>> general comments?
>>
>> Regards,
>> John
>> _______________________________________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org <mailto:open-linguistics at lists.okfn.org>
>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics
>
>
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20130712/a325e168/attachment-0001.html>
More information about the open-linguistics
mailing list