[open-linguistics] Creation of a joint linguistic LOD cloud

Nancy Ide ide at cs.vassar.edu
Thu Nov 3 16:31:57 UTC 2011


Well you can certainly have MASC, which is 500K words of American English with annotation for all the basics (token, sentence, logical, POS/lemma) plus shallow parse and named entities, and includes some portions annotated for FrameNet frames, PropBank propositions, Wordnet senses, Penn Treebank syntax (all will be annotated for PTB in a few months), and a few others. It is in GrAF format, but we have a GrAF to RDF transducer (and a GrAF to CONLL, GrAF to XML, as well as modules to import GrAF into GATE, UIMA, and NLTK. 

It is freely downloadable from http://www.anc.org/MASC -- at the moment only the first 82K have validated annotations but the rest are shortly to be released.

The data is very clean--all manually validated.


On Nov 3, 2011, at 12:12 PM, Sebastian Nordhoff wrote:

> On Thu, 03 Nov 2011 16:28:53 +0100, Nancy Ide <ide at cs.vassar.edu> wrote:
> 
>> For those of us who were not at the meeting, which type of datasets do you want?
> 
> all ;)
> 
> This is basically an exploratory project. We want to get an idea what kinds of data the people in this group have, how it is annotated, what standards are observed etc. We can then investigate how these data could be linked.
> 
> Datasets need not be particularly large, since it is more the structure of the data which is important at this point in time.
> 
> My sense is also that the data need not be very clean either, since we want to know what is out there, and that includes an estimation of the data quality we could expect. If your data happen to be clean, that is obviously not a problem either ;)
> 
> Datasets should probably be provided in RDF, although I defer to more knowledgeable people on this list as for the preferred formats (I am more a linguist than a computer person).
> 
> I would expect that we get some dictionary data, some corpus data, some bibliographic data, and maybe some phonological-typological data (Steve?). Any other kind of linguistic data will also be welcome
> 
> The project does not have a fixed structure as of yet, so suggestions as to the best course of actions are welcome
> 
> Best
> Sebastian
> 
> 
> 
>> 
>> 
>> On Nov 3, 2011, at 7:12 AM, Sebastian Nordhoff wrote:
>> 
>>> Dear all,
>>> at the real life meeting at ISWC in Bonn, we discussed the current state of the working group and decided that it is time to create some resources and make them available to showcase the potentional of LLOD.
>>> 
>>> We agreed to start a test case. This means that everybody should select one or two datasets of their choice and make them available to the group/to the world. We will then try to establish links between those resources and see how far we can get, and what other work might be required in order for the resources to become useful.
>>> 
>>> The deadline for the creation of these resources is DECEMBER 15.
>>> 
>>> It was my impression that everybody present at the meeting was willing to contribute. Unfortunately, I have no record of the resources to be provided. Could you please send an email a) confirming your participation and b) listing the kind of resources you will provide? People not present at the meeting are of course welcome to join in now.
>>> 
>>> We furthermore agreed to have regular telcos at intervals of either 1 month or 2 months, to be determined based on the outcome of the first resource creation drive.
>>> 
>>> Best wishes
>>> Sebastian
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> open-linguistics mailing list
>>> open-linguistics at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>> 
>> 
>> _______________________________________________
>> open-linguistics mailing list
>> open-linguistics at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-linguistics
>> 
>> 
>> 
> 
> 
> -- 
> Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
> 
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-linguistics





More information about the open-linguistics mailing list