[okfn-labs] nomenklatura - thinking about naming

Gregor Aisch gregor.aisch at okfn.org
Wed Apr 24 08:39:04 UTC 2013


I would try to sync language with other domains. For instance, I think this process is called Named Entity Normalization in scientific publications. 

Here's what I just found in the paper I linked above:

> The task of record linkage (RL) is to find entries that refer to the same entity in different data sources. [...] The task proved important because data sources have varying ways of referring to the same real-world entity due to, e.g., different naming conventions, misspellings or use of abbreviation. The task of reference normalization is to analyze and detect these different references. When we consider the special case of this problem for natural language texts, we have to recognize entities in a text and resolve these references either to entities that exist within the document or to real-world entities. These two steps constitute the named entity normalization problem.

So they picked the following names:

1) Entity, also 'Real-World Entity'
2) Entry, Reference
 

Cheers,
Gregor



Am 24.04.2013 um 10:01 schrieb Friedrich Lindenberg <friedrich at pudo.org>:

> Hi all, 
> 
> I want to brush up the interface and docs for nomenklatura (http://nomenklatura.pudo.org/) at some point, and the hardest thing about this project is naming. Let me give you my one-sentence on what nomenklatura does:
> 
> Nomenklatura is a data cleansing service that provides automated and manual options for merging multiple forms of a name into a canonical form. 
> 
> Example: when going through political databases, you may encounter not just "Angela Merkel", but also "Angela Merkel, CDU", "Angela Merkel, Chancellor", "Mrs. Angela Merkel", "MERKEL, Angela" etc. Nomenklatura does some basic normalisation and matching to solve the easy cases here, and then gives a nice UI to solve the harder merges by hand. In the end, you would have a single entry with a list of aliases. 
> 
> At the moment, the canonical form is called a "Value" in the domain model, while the aliases are called "Link". This has lead to confusion. I therefore want to rename the domain entities, so here's my questions: 
> 
> 1) What would people on this list call the canonical value (e.g. Entity, Lemma, ...)? 
> 
> 2) What about the aliases (e.g. Alias, Link, SurfaceForm)? 
> 
> 3) How would you pitch it? 
> 
> Thanks for any help! 
> 
> - Friedrich 
> _______________________________________________
> okfn-labs mailing list
> okfn-labs at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-labs
> Unsubscribe: http://lists.okfn.org/mailman/options/okfn-labs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20130424/5ed8f80b/attachment-0002.html>


More information about the okfn-labs mailing list