[open-bibliography] Place of Publication data from the BL dataset

William Waites ww at eris.okfn.org
Fri Nov 26 00:26:49 UTC 2010


* [2010-11-25 18:58:27 -0500] Tom Morris <tfmorris at gmail.com> écrit:
]
] One of the difficulties of the current dataset is that it has no URIs
] assigned and very few strong identifiers of any type that can be used
] as handles to reference things.  You could, for example, go through
] the extracted publication places and group duplicates together using
] Google Refine, but you'd have no way to use that cleaned data set to
] improve the original or any of the extracted copies.

Indeed. One of the reasons I didn't invent unique URIs for the places
when doing the first step of transformation for what is in
http://bnb.bibliographica.org/ would be that in effect that would just
be a process of skolemisation -- not very helpful.

However, suppose you look at a book and figure out in whatever way
that it was published in Cambridge, Ontario (for argument's sake),
what is necessary to hook that on the records is ultimately a SPARQL
query that looks like,

INSERT INTO <book_uri>
 { ?place owl:sameAs <http://sws.geonames.org/5913695/> }
WHERE
 { <book_uri> isbd:hasPlaceOfPublicationProductionDistribution ?place }

Or even better, loop over all books with the same publisher and place
name label and perform the same operation.

This way, using owl:sameAs like this to ground a blank node is a first
step in disambiguation. It only adds a piece of information and
doesn't remove anything. Once we are sure enough that this is correct, 
we can go and replace the blank node with the URI which is a more
invasive operation because it involves deleting statements.

If we can get there, some sort fun game that people can play that
creates SPARQL queries like this, we can fix the data.

The nice thing is, when we record provenance (changes) we can keep
around these queries that were done. They are much clearer and
understandable (especially if they become only slightly more elaborate
than the one above) than the brute-force transaction journal
(changeset) approach to provenance.

Cheers,
-w
-- 
William Waites
http://eris.okfn.org/ww/foaf#i
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664




More information about the open-bibliography mailing list