[open-science] Big and open data – who should bear data transfer costs?

Sat May 17 08:06:46 UTC 2014

Dear all,

when compiling a list of big, open, publicly available, data sets for my 
students to use in their projects, I recently stumbled upon an 
interesting problem: as the cost of transferring a large data set from A 
to B is not negligible and someone has to bear that cost, what does 
"open" mean in case of "big data"?

For example, Amazon Web Services offer a treasure trove of data sets, 
some on CC-BY or CC-BY-SA licenses:

   http://aws.amazon.com/publicdatasets/

Understandably, Amazon charges for data transfers out of its 
infrastructure.  When you rent Amazon's infrastructure in the same 
region in which the interesting data set is located, you're not charged 
for the transfer (but you are charged for the machines you use).

In the recent rewrite of the Panton Principles website, initiated by 
Michelle Brook (http://goo.gl/cq1SuD) open research data is currently 
defined as "data [...] made available on the internet under licenses 
that permit anyone to download [...] without financial, legal, or 
technical barriers".

The quoted sentence is careful to require lack of financial barriers 
only in the license, so charging for data transfers seems to be 
compatible with openness.

A practical question: If, as a researcher or a research organization, I 
want to publish a large data set and keep the "open" label, can I charge 
for data transfers (plus amortization costs of data storage), or do I 
have to cover them myself?

What are your thoughts?

Best regards,

Lukasz

-- 
Dr. Łukasz Bolikowski, Assistant Professor
Centre for Open Science, ICM, University of Warsaw
Contact details: http://www.icm.edu.pl/~bolo/