[open-bibliography] Disambiguation, deduplication and 'ideals'
Ben O'Steen
bosteen at gmail.com
Wed Sep 1 08:19:28 UTC 2010
On Wed, 2010-09-01 at 05:08 +0200, Thomas Krichel wrote:
> Karen Coyle writes
>
> > As you can see, the questions go on and on!
>
> Deduplication is also service context dependent. ...
I absolutely agree and I'll also say that when you are de-duplicating
for any of these reasons, you will be using some probabilistic method of
some kind, 99% of the time ;) Whether it's a fellegi-sunter based whole
record dedupe, or single field (eg id) matching, there will be false
positives and false negatives.
Your success rate will always be <100%, and the degree of success will
vary depending on who and for what purpose this was done.
Ben
More information about the open-bibliography
mailing list