[openbiblio-dev] Plotting timelines of the 'birth' of IUCr articles on a map

William Waites ww at eris.okfn.org
Mon Jan 10 12:23:41 UTC 2011


Hi Ben, I'm copying Richard Pope on this mail because I'm
not sure if he's on this list -- he's looking at some UI
stuff for bibliographica andyour js examples doing sparql
mashup things might prove useful to him.

Another question is, can we get at a dump of this data to
see about including it in the bibliographica store itself?

And lastly, have you managed to do anything about getting
a nquads dump out of the store for the BNB data? We need
this to provide to some services to stop them crawling the
whole thing and as well to provide to FU-Berlin and DERI
for passing through Silk to infer linkage to other resources. 

Cheers,
-w

* [2011-01-10 02:47:22 +0000] Ben O'Steen <bosteen at gmail.com> écrit:

] IUCr data visualisation
] 
] http://benosteen.com/timemap/index
] 
] All the authors in the IUCr dataset have a rough address associated with
] them, and with a bit of tweaking and adjusting I've been able to get
] some semblance of matches for their lat,long locations.
] 
] Of the 3774 unique address lines in the set, I've found something for
] 2796 of them - I'm sure with a few more passes across the data, we can
] improve that, but that proportion should be enough to start with.
] 
] To visualise this, I'm using a handle bit of js that binds google maps
] and simile timeline - http://code.google.com/p/timemap/ To be specific,
] the functionality I'm using in the one demo'd in
] http://timemap.googlecode.com/svn/trunk/examples/progressive.html which
] is the progressive, on-demand loading of data from a date range.
] 
] The data is sparql'd via a fast and loose SELECT and this forms the
] basis of the data sent to the js app. The address lookups are within a
] redis db, (addr-md5 hash -> string holding the lat long and type of
] match) Using the Redis pipeline feature makes it straightforward and
] responsive to lookup a number of these in one go and return them.
] 
] NB the SPARQL is:
] 
] PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
] PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
] 
] SELECT DISTINCT ?c ?name ?address ?doi ?title ?date WHERE {
]  ?doi <http://purl.org/dc/elements/1.1/title> ?title . 
]  ?doi <http://purl.org/dc/elements/1.1/date> ?date .
]  ?doi <http://purl.org/dc/terms/creator> ?c .
]  ?c <http://xmlns.com/foaf/0.1/name> ?name .
]  ?c <http://open.vocab.org/terms/recordedAddress> ?address .
]  FILTER(?date > "%s" && ?date < "%s")
] } LIMIT 400"""
] 
] As Sparql is by far the slow point, we could optimise based on the fact
] that we could pre-generate the monthly sparql output and cache it no
] problem, but I think this is more generic as it stands.
] 
] I have put the scripts, geocoded address cache and the pylon controller
] that provides the backend service into
] https://github.com/benosteen/IUCR-Geocoding - the only thing left out is
] the sparql command I used to pull all the addresses from the endpoint,
] but I think that's simple enough to ignore for now!
] 
] It would be fantastic to add colour to the pegs in the map, each colour
] connected to a single paper in a given month - the pegs otherwise show
] authors and it is hard to see how spread the authorship of a given paper
] is.
] 
] Ben
] 
] 
] _______________________________________________
] openbiblio-dev mailing list
] openbiblio-dev at lists.okfn.org
] http://lists.okfn.org/mailman/listinfo/openbiblio-dev

-- 
William Waites                <mailto:ww at styx.org>
http://eris.okfn.org/ww/         <sip:ww at styx.org>
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664




More information about the openbiblio-dev mailing list