[open-linguistics] Creation of a joint linguistic LOD cloud
Christian Chiarcos
christian.chiarcos at web.de
Thu Nov 3 19:22:47 UTC 2011
On Thu, 03 Nov 2011 16:28:53 +0100, Nancy Ide <ide at cs.vassar.edu> wrote:
> For those of us who were not at the meeting, which type of datasets do
> you want?
We actually had this discussion briefly at the meeting, as well (although
I missed a lot, participating via skype only). The general idea was to
accept everything that can be reasonably linked to other linguistic
resources such as corpora, dictionaries, thesauri, plain word lists,
collocation data, etc. In the end, the contributors will decide about the
actual definition, we wouldn't rule out anything.
The crucial point is whether the data can be assumed to be usefully linked
with other people's data, with is certainly true for corpora and
lexical-semantic resources, but possibly not for results from
psycholinguistic experiments, which are tied to a particular setup and
stimuli (unless someone objects).
As for myself, I am particularly interested in modeling linguistic
corpora, and I can provide a corpus in RDF, with OWL/DL-defined data
types. I also thought about converting MASC for this purpose. Other
possibilities would be (parts of) the Open Parallel corpus
(http://opus.lingfil.uu.se) or the Copenhagen Dependency Treebank
(http://code.google.com/p/copenhagen-dependency-treebank).
@Nancy: Is the RDF representation of the MASC already available online ?
If so, I would focus on one of the latter corpora.
A second question is how large the datasets have to be. Again, we wouldn't
prescribe anything, so, the provider himself has to decide whether the
amount of data (s)he provides represents a reasonable starting base. For
example for richly annotated corpora, already small samples could be of
interest as the community still has to work out schemes to represent
linguistic annotations (say, parallel corpora, or coreference-annotated
corpora) in RDF and RDF-based formalisms properly.
Best,
Christian
More information about the open-linguistics
mailing list