[open-bibliography] Disambiguation, deduplication and 'ideals'

Ben O'Steen bosteen at gmail.com
Wed Sep 1 08:19:28 UTC 2010


On Wed, 2010-09-01 at 05:08 +0200, Thomas Krichel wrote:
> Karen Coyle writes
> 
> > As you can see, the questions go on and on!
>  
>   Deduplication is also service context dependent. ...


I absolutely agree and I'll also say that when you are de-duplicating
for any of these reasons, you will be using some probabilistic method of
some kind, 99% of the time ;) Whether it's a fellegi-sunter based whole
record dedupe, or single field (eg id) matching, there will be false
positives and false negatives. 

Your success rate will always be <100%, and the degree of success will
vary depending on who and for what purpose this was done.

Ben





More information about the open-bibliography mailing list