[Open-access] Data Set Diabetes

Peter Murray-Rust pm286 at cam.ac.uk
Sat Feb 15 22:40:21 UTC 2014

First of all thanks for posting, Angela and I hope you have settled well
into you initial studies and research. I hope someone has an answer for you
but don't be surprised if not.

Finding data in any area is not easy. That's a major reason why CKAN was
started. Some disciplines (space, meteorology, some health) publish data
sets as such but most scientists either publish datasets along with a paper
or, more commonly, don't publish them at all.

I think in OKFN we should tackle this in the following ways (cf Panton

* encourage the posting of data (probably most usefully alongside papers)
* help to create metadata so the datasets can be discovered
* push journals, libraries etc. to take this seriously (very few do)

and build search engines.

What we need is a search engine for science. Google etc. don't do that as
they don't generally understand the domains. The data should probably be
transformed into a semantic form (e.g. Jim Hendler created/discovered a
miilion+ semantic datasets in RDF).

But there's an awful lot of data out there already , which is why I am
pushing content-mining.

And the major problem is that vested interests (e.g. academic publishers,
reference works) don't want this data discovered and published because
their business models are based on reselling the information that we have
created and is already out there.

> open-access mailing list
> open-access at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-access
> Unsubscribe: https://lists.okfn.org/mailman/options/open-access

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-access/attachments/20140216/0e887acb/attachment-0002.html>

More information about the open-access mailing list