[open-bibliography] Disambiguation, deduplication and 'ideals'
Karen Coyle
kcoyle at kcoyle.net
Wed Sep 1 02:45:12 UTC 2010
Doesn't a lot of this depend on how you define "same"? This is rather
complex today due to the replication of resources in various formats.
Is it the same if one citation is to the print copy of the article,
and another cites a postprint in a repository? (The latter may have
different formatting or different pagination depending on how it was
produced.) Is it that the texts are the same, or that the actual
publication is the same? (Which would mean that the article in the
journal and the postprint are considered different resources.) Do you
intend to distinguish between preprints and published articles? Would
you consider a piece of music on CD and its online copy to be the
same, even though the compression is different? As you can see, the
questions go on and on!
There's also the question of confidence: what degree of confidence do
you need before declaring "same"? Do you prefer to err on the side of
over-bundling or under-bundling? This interacts with your definition
of sameness -- for example, let's say you decide that if you have two
texts with the same author, same title, and same year, they are the
same. You won't check for format, length, etc. That would probably
bring together pre-prints, post-prints, and possibly even different
versions of the same text. So in that way you have defined "same"
rather broadly. Or you could decide that two texts are the same only
if they both cite precisely the same publication with the same
pagination and full date. That is a different definition of same, with
more confidence.
Hopefully, once you determine what you mean by "same" then you can
determine what you want to apply OWL sameAs to.
As you can tell, I have agonized over this one for many a long hour. :-)
kc
Quoting William Waites <william.waites at okfn.org>:
> On 10-08-31 10:54, Ben O'Steen wrote:
>> :b1 a Bundle
>>
>> sameas :bibrec_i
>> sameas :citerec_I
>> opmv:wasGeneratedBy :p1
>> created: 2010-08-......
>>
>
> I think this is reasoable as far as it goes. What you
> haven't treated is how the properties present on
> bibrec_i, citerec_i get migrated around.
>
> The OWL rule that we're trying to bypass is,
>
> { ?x owl:sameAs ?y . ?x ?p ?o } => { ?y ?p ?o }
>
> so do all properties from bibrec_i and citerec_i
> get migrated to b1 and then any post-dedup queries
> would typically be made against b1?
>
> Also mind the implication of sameas. It commutes
> so what you are also saying here is,
>
> :b1 a Person .
> :bibrec_i a Bundle .
> :citerec_i a Bundle .
>
> So maybe you don't really mean owl:sameAs on the
> bundle.
>
> But if you don't mean owl:sameAs there, then where
> do you put the properties? (Potential answer: pick
> either bibrec_i or citerec_i arbitrarily and have a
> specific predicate in the bundle to indicate that it
> is the "primary" resource).
>
> Cheers,
> -w
>
> --
> William Waites <william.waites at okfn.org>
> Mob: +44 789 798 9965 Open Knowledge Foundation
> Fax: +44 131 464 4948 Edinburgh, UK
>
> RDF Indexing, Clustering and Inferencing in Python
> http://ordf.org/
>
> _______________________________________________
> open-bibliography mailing list
> open-bibliography at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-bibliography
>
--
Karen Coyle
kcoyle at kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
More information about the open-bibliography
mailing list