[open-science] Big and open data – who should bear data transfer costs?

Sat May 17 15:03:52 UTC 2014

On Sat, May 17, 2014 at 7:51 AM, Lukasz Bolikowski
<l.bolikowski at icm.edu.pl>wrote:

> As an example, let's take the 1000 Genomes data set (
> http://aws.amazon.com/datasets/4383) with over 200 TB of data available
> on Amazon Web Services.  With transfer rate capped at 1 MB/s (not
> unreasonable for the free plan in a freemium model) it would take over 6
> years to download it.  Using BitTorrent could *somewhat* help the next
> downloaders, but the first one would still have to wait over 6 years for
> their download to complete!
>

You are forgetting to factor in the bandwidth of the downloader. Even at
gigabit speed (on an uncongested university  network), it would take me
almost 450 hrs to download 200 TB, but most likely many times longer. At a
more realistic download speed it would be impractically more.

Talking of impractical, there is little reason one should be downloading
200 TB of raw data, and even less reason for many others wanting to do the
same, if for no other reason than that not many have 200 TB of space lying
around (yes, I understand you are using 200 TB for illustrative purposes,
but replace that number with another, more reasonable big number and still
the same arguments would apply). Hopefully there will be customizable
analytical services that would allow creating one's own analytical insights
into the data without needing to hoard it in entirety.

Nevertheless, recouping for cost above and beyond what may be budgeted as
part of the mission of the offering org sounds justified. The bottom line
is, nothing is free even if it is open, and seems free. The only thing we
don't want is double-charging.

-- 
Puneet Kishor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20140517/93850d29/attachment-0003.html>