[open-science] Open data in life sciences use cases?

Fri Jan 6 02:53:37 UTC 2012

Dear Iain,

one area where lack of open data has been problematic is the one that
I had envisaged to specialize in before I found out that the
conservatism and categorizationism of classical non-public grant peer
review are an even more effective blocker in this regard than the lack
of data sharing.

I am talking about applications of Magnetic Resonance techniques
(mainly imaging, but with some spectroscopic components) to
cross-species comparative studies (preferably in vivo) with the
ultimate goal to better understand the evolution of specific traits. I
am calling this EvoMRI, and that's the handle I commonly go under on
the web if my real name is too long or otherwise does not fit.

For example, take a study on comparative MR-based brain morphometry in
primates: Brain scans are regularly performed on a number of primate
species, often even in large numbers per species, but typically just
one or two species per study. This is understandable by considering
that housing primates is an endeavour that requires a lot of effort
and resources, and so is the operation of MR scanners capable of
imaging living adult primates. So there are very few places in the
world where multiple species of primates are located nearby a MRI
center that is both equipped and licensed for non-human scans (as an
aside, there is no MR center in the world that is dedicated to
cross-species studies, and given the costs involved, none is on the
horizon), and ethical considerations prevent transporting (non-human)
primates long distances just to get them an MRI scan (mobile scanners
exist and become better all the time, but so far, they are useless for
the purposes of brain morphometry).

There is only one publicly available dataset that has brains from
multiple primate species scanned according to a common protocol (even
broadening the search to include post-mortem studies does not give
further results), and these scans (of 11 species) have been recorded
(in vivo) well over a decade ago, so do not meet the quality criteria
that underpin more recent brain morphometric algorithms of the kind
required for cross-species studies of brain structure.

Luckily, large multi-centre brain imaging studies (including ADNI that
has repeatedly popped up in data sharing discussions) in humans have
shown that some scanning protocols can be implemented in a way that
cross-scanner variability does not interfere with clinical diagnoses.
If pooling data from multiple scanners is indeed a valid option within
a species, it certainly makes sense for cross-species studies too, and
here's where the sharing of data (and code, btw) come in.

A review ( http://dx.doi.org/10.3389/neuro.11.025.2009 ) thus makes
the point that
"the major barrier to cross-species MR-based brain morphometry is not
the lack of data nor analytical tools but barriers preventing to
combine them. "

Similarly, a poster ( http://dx.doi.org/10.1038/npre.2010.4511.2 )
concludes by stating
"In order to succeed, however, computational efforts on comparative
morphometry depend on high-quality imaging data from multiple species
being more widely available."

Yet there is no culture of making such datasets public, and while many
researchers in the area (especially those personally known to each
other) will happily share their data non-publicly, their individual
terms (e.g. requests for co-authorship or for other datasets or tools
or services in return) are not necessarily compatible with each other,
and in general, they also make the sharing of any derivative works
impossible.

In addition to that, some of the major software tools in the area are
free but not open and thus cannot be adapted from their human-centric
focus to cross-species applications. They took multiple person years
to write, so starting from scratch is not necessarily an option.

Beyond primates, I remember a paper that described MRI of seals, but
when I contacted the authors about getting the data, I got an outright
refusal rather than the request simply being ignored, which is more
common.

To sum up, cross-species studies going beyond the typical lab animals
or typical toolkit of evolutionary depend on data and code sharing, so
the lack of that blocks such fields from developing. As mentioned in
the introduction, however, funding is an even more severe blocker.

First, most funding programs are dedicated to a certain topic. For
things like comparative primate brain morphometry that can variously
be tagged into and out of biology, physics, computer science,
neuroscience, imaging and so on, the fit to basically any funding call
(or scope-limited journal, for that matter) will always be
sub-optimal, thus providing such projects with a bad starting
position.

Second, the topic of the funding scheme defines the set of reviewers
and panelists that are to judge such proposals, and they will
invariably cover only part of the scope of such hard-to-categorize
projects, especially those with a methodological focus. The only way
to mitigate these problems, in my view, is to allow interaction
between applicants and reviewers and to make those exchanges public by
default.

Third, even in those lucky cases when such projects do get funded,
they are normally impeded in ways that follow from the themes of the
funding line - if the project sailed through under "primatology", you
won't easily get expenses for number crunching equipment approved, or
if it came in under "computational neuroscience", the budget for
feeding the animals (nut crunching, perhaps?) will likely approach
zero.

Fourth, the normal funding schemes think in terms of hiring a postdoc
or so for about three years, but certain types of problems - the
Polymath project perhaps being the most well-known example,
comparative brain morphometry probably another - do not necessarily
lend themselves to being solved this way, as they typically require
sets of skills that are unlikely to be adequately represented in the
set of people whose participation can be funded by such classical
hiring schemes, especially if working remotely is not an option. Plus,
some of the skills will be needed for a very small part of the
project, and much of the skill set is hard to predict in the funding
proposal anyway.

Sorry for the long post, but you hit one of my major resonances, and
much of my motivation to deal with matters of open data, open source,
public peer review or open science more generally derives from the
issues outlined above.

Daniel

On Wed, Jan 4, 2012 at 4:00 PM, Iain Hrynaszkiewicz
<Iain.Hrynaszkiewicz at biomedcentral.com> wrote:
> Dear all,
>
>
>
> I’m interested in further developing some specific use cases where open data
> (i.e. available under CC0 or equivalent terms) in journal publications would
> be useful or lack of open data has been problematic – to individual
> scientists/research groups, and perhaps even the original data publishers.
> I’m aware of reasonable evidence of societal/economic benefits for open data
> (e.g. within :
> http://ie-repository.jisc.ac.uk/279/2/JISC_data_sharing_finalreport.pdf;
> http://www.jisc.ac.uk/media/documents/publications/keepingresearchdatasafe0408.pdf)
> but more evidence (aside from more citations, in microarray research), or
> anecdotes/cases studies in its absence, of benefits to individuals/groups
> would be good.
>
>
>
> E.g. “I am scientist doing X kind of experiment and being able to reuse or
> harvest all types of Y data from Z journal (or publisher) would be excellent
> because…”
>
>
>
> And ideally…
>
>
>
> “here’s why  the current model prohibits or makes this difficult; or here’s
> an example of where such an approach has been beneficial previously….
>
>
>
> I’d like to include some of these use cases as part of a white paper,
> currently well under way, on implementation of a variable license agreement
> for open access publications enabling CC0 for data (as agreed at
> http://blogs.openaccesscentral.com/blogs/bmcblog/entry/report_from_the_publishing_open).
>
>
>
> If anyone has any suggestions they would be much appreciated – and of course
> acknowledged.
>
>
>
> Best regards,
>
>
>
> Iain
>
> Iain Hrynaszkiewicz
> Journal Publisher
>
> BioMed Central
> 236 Gray's Inn Road
> London, WC1X 8HB
>
> T: +44 (0)20 3192 2175
> F: +44 (0)20 3192 2011
> M: +44 (0)782 594 0538
> W: www.biomedcentral.com
> Skype: iainh_z
>
>
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>