[open-linguistics] Fair use (US) and CC-BY-NC

Thu Apr 20 07:13:11 UTC 2017

Dear all,
I always have a three step approach:

1) is it morally OK?
2) is it legally permitted?
3) what are the risks to whom when something is not clearly 100% legal?

If your plan fails 1), there is no need to proceed, but I think that in
this case, Christian's plan is morally defensible.

So is it legally permitted? This is unclear, as it is unclear in how far
US fair use can be bundled with additional material under German law.
One solution I could see is to host the texts on a US server and the
annotations on a European server.

Which takes us to the risks: someone downstream could be sued by the
rights holders for copyright infringement if they cannot claim fair use.
In order to mitigate that risk, Christian suggest an NC license. I see
this as a kind of warning sign: this content is not completely free for
any purpose! The aim would be to deter all too liberal use of the data.

I have done something similar for Glottolog, when we could only
distribute bibliographic data about aboriginal languages under some very
special conditions (which would not translate into any known license).
There was no real legal risk as the bibliographical data were not
copyrightable to begin with in my view, but as a kind of flag I used NC
there, too.

This is obviously not the basic idea of NC, but I find it interesting
that both Christian and I have used NC as a "taint marker" to find a
pragmatic solution.

Best wishes
Sebastian

On 04/19/2017 08:32 PM, Christian Chiarcos wrote:
> Dear  Víctor,
>
>     For the US case, you may want to read the article at Stanford on
the four
>     factors
>     http://fairuse.stanford.edu/overview/fair-use/four-factors/
>
>     But yes, the EU also does acknowledge to some extent the "fair
use", termed
>     here "copyright exceptions". The EU Copyright directive gives some
>     guidelines in Art. 5, basically imposing these three conditions for
>     copyright exceptions to apply ("fair use"):
>
>         (i) it covers only special cases (copies made for scientific
research
>         and teaching purposes; uses in educational institutions or
quotation);
>
> Yes. Germany implements §5.3(a) (and, partially, §5.3(n)), but the key
> difference to fair use is that (in Germany) this is "limited to a
pre-defined
> group of people" ("ausschließlich für einen bestimmt abgegrenzten
Kreis von
> Personen", https://www.gesetze-im-internet.de/urhg/__52a.html).
Interpreted in a
> strict (and probably correct) sense, this means that I can distribute
to any
> individual I can name in advance (e.g., a student in a particular
class, or a
> specific collaborator in a project), but not to other people that I
cannot name
> right now but that have the same characteristics (e.g., future
collaborators,
> students or any extensible group of people, e.g., "NLP researchers",
"OWLG
> members"). Effectively, I cannot distribute the data other than by
personal
> agreement. If I die, the data is probably locked for the next few decades
> because I couldn't give it a proper license, either.
>
>         (ii) it does not conflict with the normal exploitation of the
work and;
>         (iii) it does not unreasonably prejudice the legitimate
interests of the
>         author.
>
>     The nuances and differences in the implementation of the Copyright
directive
>     by the different Member States is beautifully represented in this
link:
>     http://copyrightexceptions.eu/
>
>        So, in my opinion, you only have to ask the question yourself
whether the
>     annotated's Bible editor will complain for its figure of sales to
decrease.
>
>
> I already know that won't happen (or if it will, the costs will be
marginal). At
> least for those whose (printed) texts are in the public domain, only the
> electronic edition may be protected (but only if its creation
represented a
> creative act -- this may be debatable). And those bibles are actually
freely
> available from a number of portals (albeit not in the specific format
I want to
> encode it in nor with any annotations), just not with proper license
statements
> (and thus copyright-protected), so I don't see any financial damage
done. Our
> legal department would be *much* more cautious, though.
>
>     And in no case whatsoever, you can re-license a work that is not
yours.
>
>
> Of course not. I should have been more precise, I want to distribute my
> modifications/annotations and I do not want to claim responsibility
for anything
> else. It is just that these cannot be redistributed without the
original data,*
> so I would need to disseminate them as a bundle, under conditions that
preserve
> those conditions under which I obtained the original data (i.e., US
fair use --
> but I don't see any plausible way I can enforce US law onto something
> distributed from Germany to users outside the US). My idea was that
CC-BY-NC
> could be a reasonable approximation, but this may be wrong.
>
> Best,
> Christian
>
> * Actually, they can as standoff, but only  to a limited degree (how
to encode
> corrections, e.g., a typo -- distributing diffs? -- these are derived
works, and
> thus copyright-protected, as well), and this is neither technically
scalable
> (different formats at different portals) nor sustainable (data
providers do not
> guarantee URL nor format stability).
>
>     Indeed, this is only my opinion and I would be happy to hear a more
>     qualified voice.
>
>     Regards,
>     Víctor
>
>     El 15/04/2017 a las 15:22, Christian Chiarcos escribió:
>>     Dear colleagues,
>>
>>     a few years back, I compiled a massive corpus of Bibles and
related texts
>>     in a CES-conformant XML format (following Resnik 1996), some also
with
>>     annotations. For the most part, distributing this corpus would be
illegal
>>     under European copyright law (and that's why you haven't heard
about it),
>>     but I realized that there are circumstances which could allow
>>     dissemination of a great part of it under an academic license.
>>
>>     Compiling and distributing a web corpus is basically illegal in
Europe
>>     unless explicitly permitted by an accompanying license. However,
US law
>>     has the concept of fair use, and if a data provider declares US
>>     legislation to apply (e.g., that "[t]hese Terms and Conditions
... are
>>     governed by the laws of the State of New York"), we Europeans can
rely on
>>     the principle of fair use, as well.
>>
>>     According to 17 U.S.C. § 107, "the fair use of a copyrighted work,
>>     including such use by reproduction in copies or phonorecords or
by any
>>     other means specified by that section, for purposes such as
criticism,
>>     comment, news reporting, teaching (including multiple copies for
classroom
>>     use), scholarship, or research, is not an infringement of
copyright." The
>>     intended use is for NLP research, DH scholarship and classroom
use, so
>>     that would probably not an issue -- and in fact, there is no
financial
>>     damage whatsoever as this data is freely and redundantly
available from
>>     the web.
>>
>>     However, am I allowed to distribute this corpus with an explicit
license
>>     statement? I think CC-BY-NC should protect the intellectual and
commercial
>>     interests of the creator of the electronic edition and be roughly
in the
>>     spirit of an academic license, but of course, I'm not the actual
owner of
>>     the data, but only responsible for its transformation and
annotation. I am
>>     wondering about the consequences if someone eventually creates an
NLP tool
>>     chain from this data and uses any models trained on the data in a
>>     commercial application. As the original copyright extends to derived
>>     works, this would be a clear violation of my license statement,
of course,
>>     but I would be responsible as I redistributed the data and by
transforming
>>     it from messy HTML to proper markup, I actually enabled this
violation.
>>
>>     Looking forward to your opinion ;)
>>
>>     Best,
>>     Christian
>
>
>     --
>     Víctor Rodríguez-Doncel
>     D3205 - Ontology Engineering Group (OEG)
>     Departamento de Inteligencia Artificial
>     ETS de Ingenieros Informáticos
>     Universidad Politécnica de Madrid
>
>     Campus de Montegancedo s/n
>     Boadilla del Monte-28660 Madrid, Spain
>     Tel. (+34) 91336 3753
>     Skype: vroddon3
>
>
>
>
> --
> Prof. Dr. Christian Chiarcos
> Applied Computational Linguistics
> Johann Wolfgang Goethe Universität Frankfurt a. M.
> 60054 Frankfurt am Main, Germany
>
> office: Robert-Mayer-Str. 10, #401b
> mail: chiarcos at informatik.uni-frankfurt.de
> web: http://acoli.cs.uni-frankfurt.de
> tel: +49-(0)69-798-22463
> fax: +49-(0)69-798-28931
>
>
>
> _______________________________________________
> open-linguistics mailing list
> open-linguistics at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-linguistics
> Unsubscribe: https://lists.okfn.org/mailman/options/open-linguistics
>