[open-linguistics] Defining "Openness" for Linguistic Linked Open Data

Wed Jan 17 10:09:34 UTC 2018

Dear Christian,

Even if the very first star of the high-quality "5 star linked data" [1] 
imposes that license must be open, there are some people who would like 
to soften this requirement, making the O in LOD to simply mean "open 
format". Furthermore, the European Union is striving towards 
establishing data markets, where obviously licenses cannot be open and 
where linked data may play a role (see how they are actively funding, 
right now, such research projects).

In my opinion, limiting the LLOD to strictly open datasets is a mistake, 
as it would depict a reality only partially. The webpage at 
http://linguistic-lod.org/llod-cloud already displays the cloud by 
license; I cannot possibly imagine how to improve that...

Regards,
Víctor

[1] https://www.w3.org/DesignIssues/LinkedData.html

[2] ICT-13-2018-2019: Supporting the emergence of data markets and the 
data economy
https://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/ict-13-2018-2019.html

El 16/01/2018 a las 13:32, Christian Chiarcos escribió:
> Dear all,
>
> when we first began developing the Linguistic Linked Open Data cloud 
> diagram, we followed a highly permissive approach on criteria for 
> inclusion, with the idea to move if from an abstract vision to a set 
> of actually usable resources -- in fact the first versions of the 
> diagram (before the MLODE workshop in September 2012) are explicitly 
> referred to as "drafts" because we included resources whose conversion 
> to LOD had only been *promised* the time.
>
> However, the quality criteria have been continuously enforced since 
> then. This includes availability, size, number of links, and an 
> explicit definition of linguistic relevance as an entry criterion, so 
> that these are now roughly equivalent with the LOD criteria.
>
> Along with that, we did *not* enforce an Open Definition-conformant 
> license (http://opendefinition.org/licenses/). In particular, 
> arguments have been brought forward to include non-commercial 
> resources. One of the reasons is that many classical resources 
> developed during the 1990s and early 2000s are released under 
> "academic" licenses and that even today, entire sub-communities in 
> linguistics tend to be very protective about their data. Encouraging 
> noncommercial licenses is a viable compromise to reach out to these 
> communities without compromising the idea of embracing openness 
> altogether. We did have discussions about this from the very 
> beginning, and there are good arguments for either view, but we did 
> *not* manage to establish a consensus to exclude, in particular, 
> NC-licensed data.
>
> For the moment, openness is (implicitly) defined as being in line with 
> the LOD diagram, i.e., we inherit its view that "we take a liberal 
> view of what we consider “open”. If the data is openly accessible from 
> a network point of view – that is, it's not behind an authorization 
> check or paywall" (http://lod-cloud.net/). This approach can be 
> criticized for good reasons, but it is an established and transparent 
> practice that goes back to the original LOD diagram by Cyganiak and 
> Jentzsch, and that has also been documented since then.
>
> Part of this documentation is that under 
> http://linguistic-lod.org/llod-cloud, users can get an alternative 
> visualization of the diagram with respect to licenses, and as can be 
> easily seen, about half of the LLOD bubbles are non-commercial, three 
> have no explicit license (which means a restrictive license, in 
> Germany, at least), and three more are labeled as "closed" (which may 
> in fact mean that different sub-resources have different licenses, 
> e.g., Multext-East[http://nl.ijs.si/ME/V4/], which includes CC-BY-SA 
> and CC-BY-NC lexica as well as corpus data under a 
> restricted/non-commercial license).
>
> However, this can be a problem for data providers who find their NC 
> data in the (L)LOD diagram without being "Open" according to the Open 
> Definition, as users of this data may get a wrong impression about 
> their usage rights -- despite warnings such as "Before using any data, 
> you should always check the publisher's website for the terms and 
> conditions" (http://lod-cloud.net/).
>
> The question now is what to do about this situation. Personally, I 
> would prefer to roughly stay with the current practice for the LOD and 
> LLOD diagrams for the moment, but to provide an explicit statement 
> that *our* definition of openness exceeds beyond the Open Definition 
> by including non-commercial/"academic" resources, because this is an 
> explicit need in (parts of) our community. At the same time, given 
> such a statement, resources with unclear (= restrictive) licenses 
> should be removed from the diagram. As these are quantitatively 
> marginal anyway, this should not affect the usability of LLOD 
> resources and the diagram in comparison to its current state.
>
> In any case, this is for the immediate future only. At some point in 
> the future, after intense lobbying among our peers and (hopefully) 
> growing imporance of OpenDefinition-compliant licenses, we should 
> certainly adopt a stricter definition, but for the moment, the growth 
> in resources, demonstrating their use and developing applications of 
> (L)LOD should -- IMHO -- take priority over ideological purity until 
> it is established as a conventional approach for (certain kinds of) 
> linguistic data.
>
> This may be controversial, though, so, what do others think?
>
> Best,
> Christian

-- 
Víctor Rodríguez-Doncel
D3205 - Ontology Engineering Group (OEG)
Departamento de Inteligencia Artificial
ETS de Ingenieros Informáticos
Universidad Politécnica de Madrid

Campus de Montegancedo s/n
Boadilla del Monte-28660 Madrid, Spain
Tel. (+34) 91336 3672
Skype: vroddon3