[open-science-dev] Use of Datahub for open data sets
Peter Murray-Rust
pm286 at cam.ac.uk
Fri Jul 6 09:41:19 UTC 2012
I have spent time with the OKF CKAN/Datahub group this week and have become
aware of the potential of Datahub for managing and displaying datasets. As
an example I took a BMC article with 4 supplementary CSV files and created:
http://datahub.io/dataset/finding-biological-process-modifications-in-cancer-tissues-by-mining-gene-expression-correlations
(This is simply the original article captions and the tables - I haven't
added anything).
The Datahub is a well-advanced tool and I think has potential for
associating datasets with bibserver collections. It allows online display
of the data and can be used for aggregation and modelling. It's been used
in this way for government data where it is highly regarded. This is not
trivial for science as very few datasets have good enough metadata, but
good linkers are things like GO terms, PDB Ids, etc.
An achievable vision would be to crawl the open literature and extract all
CSV files into datahub.
I will talk about this this afternoon.
If Tom or anyone has a paper with a nice set of CSV files I will put them
into datahub. I'm also happy to show this off at tomorrow's hackathon.
P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science-dev/attachments/20120706/00c036f2/attachment.html>
More information about the open-science-dev
mailing list