[open-science] Big and open data – who should bear data transfer costs?
l.bolikowski at icm.edu.pl
Sat May 17 08:06:46 UTC 2014
when compiling a list of big, open, publicly available, data sets for my
students to use in their projects, I recently stumbled upon an
interesting problem: as the cost of transferring a large data set from A
to B is not negligible and someone has to bear that cost, what does
"open" mean in case of "big data"?
For example, Amazon Web Services offer a treasure trove of data sets,
some on CC-BY or CC-BY-SA licenses:
Understandably, Amazon charges for data transfers out of its
infrastructure. When you rent Amazon's infrastructure in the same
region in which the interesting data set is located, you're not charged
for the transfer (but you are charged for the machines you use).
In the recent rewrite of the Panton Principles website, initiated by
Michelle Brook (http://goo.gl/cq1SuD) open research data is currently
defined as "data [...] made available on the internet under licenses
that permit anyone to download [...] without financial, legal, or
The quoted sentence is careful to require lack of financial barriers
only in the license, so charging for data transfers seems to be
compatible with openness.
A practical question: If, as a researcher or a research organization, I
want to publish a large data set and keep the "open" label, can I charge
for data transfers (plus amortization costs of data storage), or do I
have to cover them myself?
What are your thoughts?
Dr. Łukasz Bolikowski, Assistant Professor
Centre for Open Science, ICM, University of Warsaw
Contact details: http://www.icm.edu.pl/~bolo/
More information about the open-science