[open-linguistics] Collection of resources

Sat Jan 15 15:45:16 UTC 2011

Hi Nancy, happy to see that your post has now been published! I
read it with interest. There might be some misunderstanding though
about the relationship between sharealike (copyleft is a better 
word, I think) and commercial use. Copyleft does not prevent 
commercial use, in fact it could be argued that it encourages
it. Consider:

  * There is no requirement to distribute work. So a company
    can take copyleft data and mix it up with whatever they
    want and build a commercial service out of it.

  * An owner of a large copyleft corpus is free to license 
    it under different terms for a fee (though I would consider
    this unethical).

  * Because linguistics is a computationally intensive field,
    it is quite feasible to build a system on copyleft data
    and free software and make its functions (tagging, parsing,
    whatever) available as a paid for service - with customers
    paying for the resource usage not the data.

The only business model that is prevented is the one where 
one encloses the data and distributes it, trying to prevent
others from doing the same. That is, only business models that
rest entirely on legal fictions like intellectual property
aren't viable. Business models that are based around providing
services and access to scarce resources are quite feasible.

I don't see how linguistic data is special in this respect.
Similar concerns have come up for example with collaborative
cartography, projects such as Open Street Map which combine
data from different sources themselves and then are often
used as a single aggregated data source in mashups, etc..

I very much look forward to the elaboration of these ideas
as they pertain to this field and am quite willing to be 
convinced by sound argument that linguistic data is somehow
different from other types of data, but I'll adopt a 
skeptical attitude for now. I should note that while I 
personally tend to be in favour of copyleft approaches, there
is considerable diversity of opinion on this point within OKF
and the wider community.

Cheers,
-w

* [2011-01-14 18:26:57 -0500] Nancy Ide <ide at cs.vassar.edu> écrit:

] Hmmm... this is problematic for linguistic data. Most of the things 
] in your list are restricted from commercial use--but of course,
] the "share-alike" restriction is basically a restriction to
] non-commercial use, since commercial users can't typically
] redistribute their products based on or incorporating the data 
] under the same conditions. Anything distributed through the 
] Linguistic Data Consortium has licensing  of one kind or another, 
] which may in fact be different from the definition of open data on
] the web page. 

-- 
William Waites                <mailto:ww at styx.org>
http://eris.okfn.org/ww/         <sip:ww at styx.org>
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664