Brian MacWhinney macw at cmu.edu
Thu May 31 12:33:08 UTC 2012

Dear Open-Linguistics,
    I have just now subscribed to this list, based on urging from Sebastian Hellman.  I was interested in the idea of incorporating the CHILDES and TalkBank corpora for spoken language into LOD and Sebastian asked me why we were relying on the CC-NC license, rather than the CC-BY-SA license.  I told him that the basic motivation involved the feelings of the people who had contributed data to the corpus.  Our data include audio and video and transcripts  from children, students, aphasics, etc. across many languages.  What we would like to avoid is the possibility that someone would find that a company was "making money" from the audio or video of their children or parents without properly asking them.  We are not interested in any commercial interests ourselves.  Isn't CC-NC the right choice in this case?  Is this a problem for the goals of LOD?  In general, we try to make our data as freely available to researchers as possible without any sort of license.  A small fraction of the corpora (3%) are password protected, but the others are not.

-- Brian MacWhinney

