[openbiblio-dev] Issue 34 in openbiblio: paraphrased as give entities URIs

Rufus Pollock rufus.pollock at okfn.org
Sat Feb 26 12:19:13 UTC 2011


Just to say I'm +1 on this idea. The resulting plan can be added to
the relevant issue (which you own atm!):
<https://bitbucket.org/okfn/openbiblio/issue/34/materialize-entities-on-bibliographicaorg>

On 23 February 2011 13:01, Ben O'Steen <bosteen at gmail.com> wrote:
> I believe the currrent hash uris are based on the fingerprint you
> mention, based on the entities they already link together. However,
> I've yet to recreate the formatting used to create the hash value - is
> this code somewhere?
>
> What we could do is to have a simple counter (starting at say 1000)

I'd suggest 10000 :)

> and go through the BNB records in order as there is a numerical
> sequence to them, giving every (person) *instance* a number -
> http://bibliographica.org/entity/1003. A 'John Smith' on one record
> would get a different number to a 'John Smith' on a second. I think we
> should accept that if there is a birth date (and perhaps a death date)
> then the authors should be treated as the same across records and we
> can keep a quick mapping as we give them these human readable URIs.
>
> We can then play favorites and promote some of the more famous authors
> to have a number under 1000 as we go. Silly, but a big win for
> usability and just the sheer visual impact of it.
>
> We can then collapse and interlink URIs based on simple rules on a
> gradual basis - I'd suggest just simply changing the URIs to begin
> with, and later, to use sameAs to maintain link integrity as people
> begin to link in to the dataset. I'm sure there is a semantic debate
> to be had on whether to use sameAs or seeAlso to link to things like
> VIAF.

I agree though be nice to know what the 'openness' of VIAF is (or at
least its identifiers). Also how does VIAF generate identifiers.

> This we may be able to constrain to just difficult entity types, such
> as human names. Placenames could be brought together under the same
> URI from the first pass without much difficulty. A hash URI for them
> would be the lowest cost in this case, with a loss of human readable
> URIs, but provide a straightforward link across records. (However,
> based on the survey of placenames I did earlier, we might need to
> remove/ignore certain pieces of punctuation and whitespace.)

Can you just expand a bit on what you mean by hash URI (do you mean
uuid or a hash of some attribute of the record?).

Rufus




More information about the openbiblio-dev mailing list