[open-linguistics] Is this group doing anything for Open Data day?

Saul Albert saul.albert at eecs.qmul.ac.uk
Sun Jan 26 09:36:55 UTC 2014


Thanks Maud, that's really useful.

On Fri, Jan 24, 2014 at 02:48:13PM +0100, Maud Ehrmann wrote:
>    if it can be of interest, here come some pointers to a project with
>    similar concerns (for lexical resources and not corpora, though):
>    [1]http://wiki.creativecommons.org:8080/Grants/Assessing_the_effect_of_

The issues with ethnolinguistic research materials seem really rather
more difficult than with WordNets as the copyrights are further
complicated by all kinds of arcane permissions and license structures.

For example, a key resource for NLP in my department is the British
National Corpus http://bnc.phon.ox.ac.uk. The transcripts of which seems
to be governed by a restrictive and idiosyncratic license agreement
(http://www.natcorp.ox.ac.uk/docs/licence.html) with lots of
resource-specific and seemingly unenforceable clauses eg: (from clause 5
- Requirement to Exercise Professional Care)

"generally maintain ongoing interest and involvement in the use and
distribution of the BNC Processed Material as provided for herein."

What does that even mean? 

And these licenses are produced with reference to original permission
request letters from the early 90s, with even more specific mazes of
rights holders to negotiate.
http://www.natcorp.ox.ac.uk/corpus/permletters.html#spoken1

It actually feels like the 90s in this area with all these odd crufty
licensing structures spawning in incompatible ways. 

On a more positive note, recent work on the raw audio data from the BNC
uses a CC-BY, http://www.phon.ox.ac.uk/AudioBNC so it might be worth
just waiting this one out and hoping that as people produce new corpora
they'll adopt sensible licensing models.

Ugh. 

Saul.


-- 
phone: us: +1(323)642-8943 | uk: +44(0)7941255210 / +44(0)2071007915  
skype:saulalbert | @saul | http://saulalbert.net  



More information about the open-linguistics mailing list