[open-linguistics] Creation of a joint linguistic LOD cloud
Christian Chiarcos
christian.chiarcos at web.de
Thu Nov 3 19:32:00 UTC 2011
Of course, I will contribute with my ontologies for linguistic annotations
that formalize a number of annotation schemes and link them to ISOcat and
GOLD: http://purl.org/olia. (Available online, will be published under
CC-BY as soon as the reference publication has appeared.)
Christian
On Thu, 03 Nov 2011 20:22:47 +0100, Christian Chiarcos
<christian.chiarcos at web.de> wrote:
> On Thu, 03 Nov 2011 16:28:53 +0100, Nancy Ide <ide at cs.vassar.edu> wrote:
>
>> For those of us who were not at the meeting, which type of datasets do
>> you want?
>
> We actually had this discussion briefly at the meeting, as well
> (although I missed a lot, participating via skype only). The general
> idea was to accept everything that can be reasonably linked to other
> linguistic resources such as corpora, dictionaries, thesauri, plain word
> lists, collocation data, etc. In the end, the contributors will decide
> about the actual definition, we wouldn't rule out anything.
>
> The crucial point is whether the data can be assumed to be usefully
> linked with other people's data, with is certainly true for corpora and
> lexical-semantic resources, but possibly not for results from
> psycholinguistic experiments, which are tied to a particular setup and
> stimuli (unless someone objects).
>
> As for myself, I am particularly interested in modeling linguistic
> corpora, and I can provide a corpus in RDF, with OWL/DL-defined data
> types. I also thought about converting MASC for this purpose. Other
> possibilities would be (parts of) the Open Parallel corpus
> (http://opus.lingfil.uu.se) or the Copenhagen Dependency Treebank
> (http://code.google.com/p/copenhagen-dependency-treebank).
> @Nancy: Is the RDF representation of the MASC already available online ?
> If so, I would focus on one of the latter corpora.
>
> A second question is how large the datasets have to be. Again, we
> wouldn't prescribe anything, so, the provider himself has to decide
> whether the amount of data (s)he provides represents a reasonable
> starting base. For example for richly annotated corpora, already small
> samples could be of interest as the community still has to work out
> schemes to represent linguistic annotations (say, parallel corpora, or
> coreference-annotated corpora) in RDF and RDF-based formalisms properly.
>
> Best,
> Christian
More information about the open-linguistics
mailing list