[open-linguistics] Developing a consolidated LOD vocabulary for linguistic annotations

Wed Jan 29 10:31:03 UTC 2020

Dear all,

I'm forwarding another post on the topic by Sebastian Hellmann that didn't
seem to get properly delivered when he sent it himself. I also CC to the
new open linguistics mailing list (as a test, basically). Details about the
new mailing list and the open linguistics forum coming later today.

Best,
Christian

-------- Forwarded Message --------
Subject: Re: [open-linguistics] Developing a consolidated LOD vocabulary
for linguistic annotations
Date: Thu, 23 Jan 2020 11:24:06 +0100
From: Sebastian Hellmann <hellmann at informatik.uni-leipzig.de>
<hellmann at informatik.uni-leipzig.de>
To: A list for those interested in open data in linguistics.
<open-linguistics at s116.okserver.org> <open-linguistics at s116.okserver.org>,
A list for those interested in open data in linguistics.
<open-linguistics at lists.okfn.org> <open-linguistics at lists.okfn.org>

Hi Christian, all,

two things from our side, which should make LOD Vocabularies and LOD more
robust:

1. we implemented Databus Collections. So you register and then you can
create a collection of Databus datasets in your space, i.e. personal DCAT
Catalogue, which is also available as Linked Data. The collection URI can
be used, e.g. for papers and also for automatic processing. We implemented
loading of collections into a Virtuoso SPARQL endpoint. This will be used
to easier update all DBpedia language chapters, i.e. de.dbpedia.org/sparql
and de.dbpedia.org/resource/Berlin , see
https://forum.dbpedia.org/t/simplifying-the-chapter-endpoint-deployment/145

The loading process includes the Databus Client, which handles download,
compression, isomorphic formats, and even CSV to RDF conversion, if a
mapping is given. This means that you can load an ontology in .owl a
dataset in nt.bz2 and a csv (if a mapping is added) directly from the bus
into a server database. Later we might also implement direct hosting, i.e.
you pay 20-100€ per month for the server and then SPARQL and Linked Data
are loaded and updated there automatically.

People are already starting to upload their datasets and then load them
together with DBpedia from the bus. We are still in beta phase here:
https://forum.dbpedia.org/t/databus-cannot-add-new-dataset-to-collection/342

2. We already tested Databus versioning of ontologies on the Databus:;
https://databus.dbpedia.org/denis/ontology/dbo-snapshots

This makes a new Databus version, whenever the ontology is changed, so you
have full backlog, diff and also stable version Ids. We intend to crawl ALL
ontology URLs in this way and create a versioned mirror of changes.

Based on these fixed artifact and version IDs  we will be able to create
mappings between ontologies, that can be managed centrally. Normally, there
should be a default mapping and then there can be variants as well.

There can also be detection methods for ontology consistency as well as
checking whether mappings are missing or whether they beak on update of the
ontology.

-- Sebastian

On 18.01.20 11:43, Christian Chiarcos wrote:

Dear all,

with this email, I would like to ask for interest in the development of a
consolidated LOD vocabulary for linguistic annotations for applications
across language technology, empirical linguistics, computational
lexicography, digital humanities, etc. There are numerous vocabularies
available for the purpose, most notably Web Annotation (more used in
BioNLP and DH), NIF (more frequently used in NLP), and continued support
for both seems to be desired by their respective user community. Yet, they
are neither fully interoperable with each other nor do they cover all
relevant usecases* or provide the capabilities of more generic formats.**

I am in the process of reaching out to different communities to ask for
expressions of interest to discuss this further (for the moment, a private
email to me). So, if this is of any interest to you, please let me know.
If, say, at least five possible contributors can be found, I would set up
a Doodle poll to organize a joint call. The goal of that call would be to
discuss how and where to proceed. One possibility is a discussion within a
designated W3C Community Group, say, LD4LT
(https://www.w3.org/community/ld4lt), but we can discuss other options, as
well.

Best regards,
Christian

* This is why other, more specialized formats do exist, e.g., the LAPPS
Interchange Format [https://wiki.lappsgrid.org/interchange/overview.html],
CoNLL-RDF [https://github.com/acoli-repo/conll-rdf/blob/master/owl],
RDF-NAF [http://wordpress.let.vupr.nl/naf/], formats for Interlinear
Glossed Text [https://github.com/acoli-repo/ligt].

** Generic (pre-RDF) vocabularies for linguistic annotations in general
include LAF [https://www.iso.org/standard/37326.html] and LAF
implementations such as PAULA
[https://www.sfb632.uni-potsdam.de/en/paula.html] and its OWL2/DL
serialization POWLA [http://purl.org/powla].

-- 
All the best,
Sebastian Hellmann

Director of Knowledge Integration and Linked Data Technologies (KILT)
Competence Center
at the Institute for Applied Informatics (InfAI) at Leipzig University
Executive Director of the DBpedia Association
Projects: http://dbpedia.org, http://nlp2rdf.org,
http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
<http://www.w3.org/community/ld4lt>
Homepage: http://aksw.org/SebastianHellmann
Research Group: http://aksw.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-linguistics/attachments/20200129/73892511/attachment-0002.html>