[open-linguistics] Categories for data in the LLOD Cloud Diagram

John McCrae jmccrae at cit-ec.uni-bielefeld.de
Fri Jul 12 15:37:37 UTC 2013


Hi

Dave you raise some very good points, perhaps the best idea is just to have
a tag that is 'multilingual'? This would also work nicely as it could be
used to identify other multilingual linked resources, which may be of
interest to the BPM-LOD group (http://www.w3.org/community/bpmlod/)

Regards,
John


On Fri, Jul 12, 2013 at 4:48 PM, Dave Lewis <dave.lewis at cs.tcd.ie> wrote:

>  Hi Hugh, all,
>
>  True :: A published journal paper discussing a grammatical feature of a
> minority language would be typed as a bitext.
>
>
>
> I'd presume this would not be true as bi-text, as usually this denotes a
> set of aligned pairs of source and translation sentences, phrases or words
> that are the outcome of some translation process.
>
> It raises an interesting question of whether bi-text should be a
> classification by itself, or is a characterization of the _link_ between
> two monolingual resources. The latter would be a bit more in-line with how
> ELRA characteristes resources, (e.g.
> http://catalog.elra.info/index.php?language=en ). They supporting both
> monolingual and multilingual version of lexica, corpora and terminology  -
> though not speech and multimodal/multimedia resoruces.
>
> Also, for multilingual resources the tag 'bitext' might be a bit
> misleading, as there could be links in multilingual corpora from source
> text to translation in more than one other language.
>
> Another question is the classification of comparable text, i.e.  text in
> two languages that wouldn't yield a clean bi-text alignment as it does not
> result from a sentence by sentence translation process, e.g. wikipedia
> pages on the same topic authored in different languages, or transcreation
> of marketting material
>
> cheers,
> Dave
>
>
>  On Jul 11, 2013, at 11:13 AM, John McCrae wrote:
>
>  Hi all,
>
>  It was discussed today generating categories on the current LLOD diagram
> as here
>
>
> https://raw.github.com/jmccrae/llod-cloud.py/master/llod-cloud.july2013.png
>
>  The proposal is that we should divide language resources into 6 broad
> categories
>
>
>    - Terminology and lexicon resources (tag: *lexical*)
>     - e.g., Wiktionary derived resources
>    - Typological Databases (tag: *typological*)
>     - e.g., WALS
>    - Translation Memories and Bitext (tag: *bitext*)
>     - e.g., JRC Names
>    - Annotated Corpora (tag: *annotated-corpus*)
>     - e.g., Alpino
>    - Multimodal resources (tag: *multimodal-corpus*)
>     - Not sure if we have any examples as of yet
>    - Metadata and linguistic categories (tag: *linguistic-metadata*)
>     - e.g., ISOcat
>
> Does this seems like a sufficient division that would clarify the relative
> spread of the LLOD data, and does anyone have any other general comments?
>
>  Regards,
> John
>  _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics
>
>
>
>
> _______________________________________________
> open-linguistics mailing listopen-linguistics at lists.okfn.orghttp://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics
>
>
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: http://lists.okfn.org/mailman/options/open-linguistics
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20130712/0689563e/attachment-0001.html>


More information about the open-linguistics mailing list