[public-lod2] Can we create better links by playing games?

Wed Jun 20 19:29:16 UTC 2012

Hello Michael,

Am 20.06.2012 10:03, schrieb Michael Hopwood:
> Hello Jens,
>
> I'm sad to say I couldn't get the game going because it froze before
> I could answer my first question.

I'm sorry to hear that. It seems to work OK for most people, but it's 
also quite heavy on CPU. Which browser did you use?

> What this game is talking about, though, is something that libraries,
> museums and archives have been doing for a long time, and are still
> doing (although now there's much more outsourcing due to the fact
> that cataloguing and classification are perceived as high-cost, low
> return activities compared with full-text search of various kinds).
>
> Librarians consider these activities essential, not necessarily
> tedious...!

No offence was intended. Of course, those activities are very important. 
We have validated thousands of links in our group to improve the 
precision and recall of interlinking. "Difficult" and/or 
"time-consuming" would have been better phrases in that context.

> The way this would be done in practice would not just be to check one
> source against one (almost randomly-selected) "expert"

It is a bit more complex. The final score of a link is a combination of:
* a heuristic from a LIMES/SILK link specification
* the opinion of several users who "played" that link
If only a single person has validated the link in the game (and there is 
a low confidence in the link specification), then the overall confidence 
in the link will be low. At the moment, the research hypothesis is that 
combining two different sources of evidence allows to improve the 
overall precision and recall of linking tasks.

> but to
> "triangulate" by comparing the questionable source (e.g. DBPedia)
> with a known, trusted source (e.g. CIA World Factbook) and then
> having the "expert" decide on each match's plausibility...

That is a very valid point. If a very trustworthy reference model 
exists, that is certainly very helpful. For those cases in which such 
reference models (e.g. VIAF) exist, triangulation is indeed a good 
concept and I will think about how to integrate that in our work on 
linking in general. For the game itself, we opted for a generic solution 
for now, which allows to validate a linkset by just specifying the 
SPARQL endpoints and a template (for the visual presentation).

Kind regards,

Jens

-- 
Dr. Jens Lehmann
Head of AKSW/MOLE group, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc