[open-linguistics] Fair use (US) and CC-BY-NC

Wed Apr 19 18:32:36 UTC 2017

Dear  Víctor,

> For the US case, you may want to read the article at Stanford on the  
> four factors
> http://fairuse.stanford.edu/overview/fair-use/four-factors/
>
> But yes, the EU also does acknowledge to some extent the "fair use",  
> termed here "copyright exceptions". The EU Copyright directive gives  
> some >guidelines in Art. 5, basically imposing these three conditions  
> for copyright exceptions to apply ("fair use"):
>> (i) it covers only special cases (copies made for scientific research  
>> and teaching purposes; uses in educational institutions or quotation);
Yes. Germany implements §5.3(a) (and, partially, §5.3(n)), but the key  
difference to fair use is that (in Germany) this is "limited to a  
pre-defined group of people" ("ausschließlich für einen bestimmt  
abgegrenzten Kreis von Personen",  
https://www.gesetze-im-internet.de/urhg/__52a.html). Interpreted in a  
strict (and probably correct) sense, this means that I can distribute to  
any individual I can name in advance (e.g., a student in a particular  
class, or a specific collaborator in a project), but not to other people  
that I cannot name right now but that have the same characteristics (e.g.,  
future collaborators, students or any extensible group of people, e.g.,  
"NLP researchers", "OWLG members"). Effectively, I cannot distribute the  
data other than by personal agreement. If I die, the data is probably  
locked for the next few decades because I couldn't give it a proper  
license, either.
>> (ii) it does not conflict with the normal exploitation of the work and;  
>> (iii) it does not unreasonably prejudice the legitimate interests of  
>> the author.
> The nuances and differences in the implementation of the Copyright  
> directive by the different Member States is beautifully represented in  
> this link:
> http://copyrightexceptions.eu/
>
>  So, in my opinion, you only have to ask the question yourself whether  
> the annotated's Bible editor will complain for its figure of sales to  
> decrease.

I already know that won't happen (or if it will, the costs will be  
marginal). At least for those whose (printed) texts are in the public  
domain, only the electronic edition may be protected (but only if its  
creation represented a creative act -- this may be debatable). And those  
bibles are actually freely available from a number of portals (albeit not  
in the specific format I want to encode it in nor with any annotations),  
just not with proper license statements (and thus copyright-protected), so  
I don't see any financial damage done. Our legal department would be  
*much* more cautious, though.

> And in no case whatsoever, you can re-license a work that is not yours.

Of course not. I should have been more precise, I want to distribute my  
modifications/annotations and I do not want to claim responsibility for  
anything else. It is just that these cannot be redistributed without the  
original data,* so I would need to disseminate them as a bundle, under  
conditions that preserve those conditions under which I obtained the  
original data (i.e., US fair use -- but I don't see any plausible way I  
can enforce US law onto something distributed from Germany to users  
outside the US). My idea was that CC-BY-NC could be a reasonable  
approximation, but this may be wrong.

Best,
Christian

* Actually, they can as standoff, but only  to a limited degree (how to  
encode corrections, e.g., a typo -- distributing diffs? -- these are  
derived works, and thus copyright-protected, as well), and this is neither  
technically scalable (different formats at different portals) nor  
sustainable (data providers do not guarantee URL nor format stability).

> Indeed, this is only my opinion and I would be happy to hear a more  
> qualified voice.
>
> Regards,
> Víctor
>
> El 15/04/2017 a las 15:22, Christian Chiarcos escribió:
>> Dear colleagues,
>> a few years back, I compiled a massive corpus of Bibles and related  
>> textsin a CES-conformant XML format (following Resnik 1996), some also  
>> withannotations. For the most part, distributing this corpus would be  
>> illegalunder European copyright law (and that's why you haven't heard  
>> about it),but I realized that there are circumstances which could allow 
>> dissemination of a great part of it under an academic license.
>> Compiling and distributing a web corpus is basically illegal in Europe 
>> unless explicitly permitted by an accompanying license. However, US law 
>> has the concept of fair use, and if a data provider declares US 
>> legislation to apply (e.g., that "[t]hese Terms and Conditions ... are 
>> governed by the laws of the State of New York"), we Europeans can rely  
>> onthe principle of fair use, as well.
>> According to 17 U.S.C. § 107, "the fair use of a copyrighted work, 
>> including such use by reproduction in copies or phonorecords or by any 
>> other means specified by that section, for purposes such as criticism, 
>> comment, news reporting, teaching (including multiple copies for  
>> classroomuse), scholarship, or research, is not an infringement of  
>> copyright." Theintended use is for NLP research, DH scholarship and  
>> classroom use, sothat would probably not an issue -- and in fact, there  
>> is no financialdamage whatsoever as this data is freely and redundantly  
>> available fromthe web.
>> However, am I allowed to distribute this corpus with an explicit license 
>> statement? I think CC-BY-NC should protect the intellectual and  
>> commercialinterests of the creator of the electronic edition and be  
>> roughly in thespirit of an academic license, but of course, I'm not the  
>> actual owner ofthe data, but only responsible for its transformation  
>> and annotation. I amwondering about the consequences if someone  
>> eventually creates an NLP toolchain from this data and uses any models  
>> trained on the data in acommercial application. As the original  
>> copyright extends to derivedworks, this would be a clear violation of  
>> my license statement, of course,but I would be responsible as I  
>> redistributed the data and by transformingit from messy HTML to proper  
>> markup, I actually enabled this violation.
>> Looking forward to your opinion ;)
>> Best,Christian
>
>
> --Víctor Rodríguez-Doncel
> D3205 - Ontology Engineering Group (OEG)
> Departamento de Inteligencia Artificial
> ETS de Ingenieros Informáticos
> Universidad Politécnica de Madrid
>
> Campus de Montegancedo s/n
> Boadilla del Monte-28660 Madrid, Spain
> Tel. (+34) 91336 3753
> Skype: vroddon3

-- 
Prof. Dr. Christian Chiarcos
Applied Computational Linguistics
Johann Wolfgang Goethe Universität Frankfurt a. M.
60054 Frankfurt am Main, Germany

office: Robert-Mayer-Str. 10, #401b
mail: chiarcos at informatik.uni-frankfurt.de
web: http://acoli.cs.uni-frankfurt.de
tel: +49-(0)69-798-22463
fax: +49-(0)69-798-28931
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20170419/eba4d999/attachment-0003.html>