[open-science] HACKATHON-Semantic Web Identifiers for bioscience

Peter Murray-Rust pm286 at cam.ac.uk
Fri Dec 2 17:46:18 UTC 2011

Jenny has highlighted that we shall be using this list to discuss the
hackathon. I suggest we use a separate title for each thread, prefaced by

My problem is how to create indentiers for (say) viruses. If you look at
Wikipedia it doesn't give IDs. But I then discovered (by chance) taxid:
which gives numbers. But the pages don't give static URIs (they contain
cgi). I am cutting and pasing the discussion (and I shal;l refer any others
to here.

Jerven Bollema

Hi Peter, All,

All taxons in the UniProt taxonomy can be found via (
http://purl.uniprot.org/taxonomy/10305). This is synchronized with the NCBI
taxonomy and is the same in the public version (release delta excepted).
Some limited NCBI taxonomy curation happens at the Swiss-Prot group which
also does the UniProt rdf work (Guess where I work ;).

In this case you actually have an link in rdf from the herpes virus to its
hosts. The proteins it encodes (might not be all for each virus isolate
e.g. in this case only a single virion membrane protein is known) and links
to relevant papers as well as related virion proteins.
Will love to show you all how you can get this data in RDF and work with it
using SPARQL.


M. Scott Marshall
show details 11:36 AM (6 hours ago)

Dear Peter and Jerven,

Nice blog with dawg and frog!

Thanks for the answer Jerven. I'm looking forward to this.

Not wanting to start (too much) commotion but also bumped into this
for taxons: http://rs.tdwg.org/dwc/index.htm
>From recent conversations stemming from the biohackathon in Kyoto, I
surmise that the NCBI taxonomy, presumably with a bias toward human
and model organisms, as well as its adoption by Uniprot, should be the
preferred choice for expressing taxon info in the context of
biomedical knowledge?

Next question (thinking that I know the answer) is how to integrate
datasets that use one of each: NCBI and Darwin? Of course, we can
shelve this and come back to it some other year, if we want to dig
into something more specifically biomedical. The issue will eventually
come back to haunt us in any case. For example, with chemical

Also, is taxon out of scope for Identifier.org ?


Hi All,

Don't wish to spam the mailing list about which taxonomy to use. However if
you actually look at the darwin and uniprot taxonomy "schema" then they are
very similar. And even in the taxonomy world it doesn't have that many

Its the instances that get hairy.
i.e. is it

Dugu is a rodentia
<purl.uniprot.org/taxonomy/10160> rdfs:subClassOf <
Dugu is a Caviomorpha
<purl.uniprot.org/taxonomy/10160> rdfs:subClassOf <

Which gets taxonomist all exited :)

Mapping schemas is easy to do here. Its mapping instances that get the
feuds started :D


PMR comment - this isn't spam, it's science!
PMR - thanks for this. If we can make progress on identifiers it makes *me*
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20111202/14e818cd/attachment.html>

More information about the open-science mailing list