[open-science] Big and open data – who should bear data transfer costs?

Sat May 17 15:29:23 UTC 2014

On 05/17/2014 05:03 PM, P Kishor wrote:
> Talking of impractical, there is little reason one should be downloading
> 200 TB of raw data, and even less reason for many others wanting to do
> the same, if for no other reason than that not many have 200 TB of space
> lying around [...]

One reason I can think of it to create mirrors of popular resources in 
research data centres all over the world and provide data analysis 
services to local research communities (my organization is currently 
building a research data centre with capabilities of storing and 
analysing data sets of that volume).

> Nevertheless, recouping for cost above and beyond what may be budgeted
> as part of the mission of the offering org sounds justified. The bottom
> line is, nothing is free even if it is open, and seems free. The only
> thing we don't want is double-charging.

Very reasonable and practical approach.

I'm still not sure, though, how to classify data sets on AWS (open or 
not?).  If I were a for-profit company like Amazon, I would probably 
provide financial incentives to use my infrastructure and discourage 
transfers outside.  Peter mentioned earlier "transparent and acceptable" 
costs as requirements for openness.  It's unrealistic to expect the 
level of financial transparency from Amazon that would allow us to judge 
whether data made available via AWS is "open".  After all, IMHO there is 
no social nor legal contract that would bind Amazon to disclose 
financial details of their policy on data transfer charges.

Best regards,

Lukasz

-- 
Dr. Łukasz Bolikowski, Assistant Professor
Centre for Open Science, ICM, University of Warsaw
Contact details: http://www.icm.edu.pl/~bolo/