[open-linguistics] Defining "Openness" for Linguistic Linked Open Data
Víctor Rodríguez Doncel
vrodriguez at fi.upm.es
Wed Jan 17 10:09:34 UTC 2018
Dear Christian,
Even if the very first star of the high-quality "5 star linked data" [1]
imposes that license must be open, there are some people who would like
to soften this requirement, making the O in LOD to simply mean "open
format". Furthermore, the European Union is striving towards
establishing data markets, where obviously licenses cannot be open and
where linked data may play a role (see how they are actively funding,
right now, such research projects).
In my opinion, limiting the LLOD to strictly open datasets is a mistake,
as it would depict a reality only partially. The webpage at
http://linguistic-lod.org/llod-cloud already displays the cloud by
license; I cannot possibly imagine how to improve that...
Regards,
Víctor
[1] https://www.w3.org/DesignIssues/LinkedData.html
[2] ICT-13-2018-2019: Supporting the emergence of data markets and the
data economy
https://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/ict-13-2018-2019.html
El 16/01/2018 a las 13:32, Christian Chiarcos escribió:
> Dear all,
>
> when we first began developing the Linguistic Linked Open Data cloud
> diagram, we followed a highly permissive approach on criteria for
> inclusion, with the idea to move if from an abstract vision to a set
> of actually usable resources -- in fact the first versions of the
> diagram (before the MLODE workshop in September 2012) are explicitly
> referred to as "drafts" because we included resources whose conversion
> to LOD had only been *promised* the time.
>
> However, the quality criteria have been continuously enforced since
> then. This includes availability, size, number of links, and an
> explicit definition of linguistic relevance as an entry criterion, so
> that these are now roughly equivalent with the LOD criteria.
>
> Along with that, we did *not* enforce an Open Definition-conformant
> license (http://opendefinition.org/licenses/). In particular,
> arguments have been brought forward to include non-commercial
> resources. One of the reasons is that many classical resources
> developed during the 1990s and early 2000s are released under
> "academic" licenses and that even today, entire sub-communities in
> linguistics tend to be very protective about their data. Encouraging
> noncommercial licenses is a viable compromise to reach out to these
> communities without compromising the idea of embracing openness
> altogether. We did have discussions about this from the very
> beginning, and there are good arguments for either view, but we did
> *not* manage to establish a consensus to exclude, in particular,
> NC-licensed data.
>
> For the moment, openness is (implicitly) defined as being in line with
> the LOD diagram, i.e., we inherit its view that "we take a liberal
> view of what we consider “open”. If the data is openly accessible from
> a network point of view – that is, it's not behind an authorization
> check or paywall" (http://lod-cloud.net/). This approach can be
> criticized for good reasons, but it is an established and transparent
> practice that goes back to the original LOD diagram by Cyganiak and
> Jentzsch, and that has also been documented since then.
>
> Part of this documentation is that under
> http://linguistic-lod.org/llod-cloud, users can get an alternative
> visualization of the diagram with respect to licenses, and as can be
> easily seen, about half of the LLOD bubbles are non-commercial, three
> have no explicit license (which means a restrictive license, in
> Germany, at least), and three more are labeled as "closed" (which may
> in fact mean that different sub-resources have different licenses,
> e.g., Multext-East[http://nl.ijs.si/ME/V4/], which includes CC-BY-SA
> and CC-BY-NC lexica as well as corpus data under a
> restricted/non-commercial license).
>
> However, this can be a problem for data providers who find their NC
> data in the (L)LOD diagram without being "Open" according to the Open
> Definition, as users of this data may get a wrong impression about
> their usage rights -- despite warnings such as "Before using any data,
> you should always check the publisher's website for the terms and
> conditions" (http://lod-cloud.net/).
>
> The question now is what to do about this situation. Personally, I
> would prefer to roughly stay with the current practice for the LOD and
> LLOD diagrams for the moment, but to provide an explicit statement
> that *our* definition of openness exceeds beyond the Open Definition
> by including non-commercial/"academic" resources, because this is an
> explicit need in (parts of) our community. At the same time, given
> such a statement, resources with unclear (= restrictive) licenses
> should be removed from the diagram. As these are quantitatively
> marginal anyway, this should not affect the usability of LLOD
> resources and the diagram in comparison to its current state.
>
> In any case, this is for the immediate future only. At some point in
> the future, after intense lobbying among our peers and (hopefully)
> growing imporance of OpenDefinition-compliant licenses, we should
> certainly adopt a stricter definition, but for the moment, the growth
> in resources, demonstrating their use and developing applications of
> (L)LOD should -- IMHO -- take priority over ideological purity until
> it is established as a conventional approach for (certain kinds of)
> linguistic data.
>
> This may be controversial, though, so, what do others think?
>
> Best,
> Christian
--
Víctor Rodríguez-Doncel
D3205 - Ontology Engineering Group (OEG)
Departamento de Inteligencia Artificial
ETS de Ingenieros Informáticos
Universidad Politécnica de Madrid
Campus de Montegancedo s/n
Boadilla del Monte-28660 Madrid, Spain
Tel. (+34) 91336 3672
Skype: vroddon3
More information about the open-linguistics
mailing list