[openbiblio-dev] Plotting timelines of the 'birth' of IUCr articles on a map

Ben O'Steen bosteen at gmail.com
Mon Jan 10 02:47:22 UTC 2011

IUCr data visualisation


All the authors in the IUCr dataset have a rough address associated with
them, and with a bit of tweaking and adjusting I've been able to get
some semblance of matches for their lat,long locations.

Of the 3774 unique address lines in the set, I've found something for
2796 of them - I'm sure with a few more passes across the data, we can
improve that, but that proportion should be enough to start with.

To visualise this, I'm using a handle bit of js that binds google maps
and simile timeline - http://code.google.com/p/timemap/ To be specific,
the functionality I'm using in the one demo'd in
http://timemap.googlecode.com/svn/trunk/examples/progressive.html which
is the progressive, on-demand loading of data from a date range.

The data is sparql'd via a fast and loose SELECT and this forms the
basis of the data sent to the js app. The address lookups are within a
redis db, (addr-md5 hash -> string holding the lat long and type of
match) Using the Redis pipeline feature makes it straightforward and
responsive to lookup a number of these in one go and return them.

NB the SPARQL is:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?c ?name ?address ?doi ?title ?date WHERE {
 ?doi <http://purl.org/dc/elements/1.1/title> ?title . 
 ?doi <http://purl.org/dc/elements/1.1/date> ?date .
 ?doi <http://purl.org/dc/terms/creator> ?c .
 ?c <http://xmlns.com/foaf/0.1/name> ?name .
 ?c <http://open.vocab.org/terms/recordedAddress> ?address .
 FILTER(?date > "%s" && ?date < "%s")
} LIMIT 400"""

As Sparql is by far the slow point, we could optimise based on the fact
that we could pre-generate the monthly sparql output and cache it no
problem, but I think this is more generic as it stands.

I have put the scripts, geocoded address cache and the pylon controller
that provides the backend service into
https://github.com/benosteen/IUCR-Geocoding - the only thing left out is
the sparql command I used to pull all the addresses from the endpoint,
but I think that's simple enough to ignore for now!

It would be fantastic to add colour to the pegs in the map, each colour
connected to a single paper in a given month - the pegs otherwise show
authors and it is hard to see how spread the authorship of a given paper


More information about the openbiblio-dev mailing list