[openbiblio-dev] Issue 34 in openbiblio: paraphrased as give entities URIs

Ben O'Steen bosteen at gmail.com
Wed Feb 23 13:01:38 UTC 2011


I believe the currrent hash uris are based on the fingerprint you
mention, based on the entities they already link together. However,
I've yet to recreate the formatting used to create the hash value - is
this code somewhere?

What we could do is to have a simple counter (starting at say 1000)
and go through the BNB records in order as there is a numerical
sequence to them, giving every (person) *instance* a number -
http://bibliographica.org/entity/1003. A 'John Smith' on one record
would get a different number to a 'John Smith' on a second. I think we
should accept that if there is a birth date (and perhaps a death date)
then the authors should be treated as the same across records and we
can keep a quick mapping as we give them these human readable URIs.

We can then play favorites and promote some of the more famous authors
to have a number under 1000 as we go. Silly, but a big win for
usability and just the sheer visual impact of it.

We can then collapse and interlink URIs based on simple rules on a
gradual basis - I'd suggest just simply changing the URIs to begin
with, and later, to use sameAs to maintain link integrity as people
begin to link in to the dataset. I'm sure there is a semantic debate
to be had on whether to use sameAs or seeAlso to link to things like
VIAF.

This we may be able to constrain to just difficult entity types, such
as human names. Placenames could be brought together under the same
URI from the first pass without much difficulty. A hash URI for them
would be the lowest cost in this case, with a loss of human readable
URIs, but provide a straightforward link across records. (However,
based on the survey of placenames I did earlier, we might need to
remove/ignore certain pieces of punctuation and whitespace.)

Ben




More information about the openbiblio-dev mailing list