[open-science] Big and open data – who should bear data transfer costs?

Peter Murray-Rust pm286 at cam.ac.uk
Sat May 17 15:12:58 UTC 2014

We can separate:
 - what is desirable
 - and what is the label

If we are using the label "Open" as in the Open Definition (
http://opendefinition.org/od/ ) then there is allowance for a reproduction

The Definition

A work is open if its manner of distribution satisfies the following
1. Access

The work shall be available as a whole and at no more than a reasonable
reproduction cost, preferably downloading via the Internet without charge.
The work must also be available in a convenient and modifiable form.

*Comment: This can be summarized as ‘social’ openness – not only are you
allowed to get the work but you can get it. ‘As a whole’ prevents the
limitation of access by indirect means, for example by only allowing access
to a few items of a database at a time (material should be available in
bulk as necessary). Convenient and modifiable means that material should be
machine readable (rather than, for example, just human readable).*
- See more at: http://opendefinition.org/od/#sthash.J8YxD25l.dpuf

By this definition it is allowable to charge for reproduction and still
label the work "Open"

It can be further copied without charge.


On Sat, May 17, 2014 at 4:03 PM, P Kishor <punk.kish at gmail.com> wrote:

> On Sat, May 17, 2014 at 7:51 AM, Lukasz Bolikowski <
> l.bolikowski at icm.edu.pl> wrote:
>> As an example, let's take the 1000 Genomes data set (
>> http://aws.amazon.com/datasets/4383) with over 200 TB of data available
>> on Amazon Web Services.  With transfer rate capped at 1 MB/s (not
>> unreasonable for the free plan in a freemium model) it would take over 6
>> years to download it.  Using BitTorrent could *somewhat* help the next
>> downloaders, but the first one would still have to wait over 6 years for
>> their download to complete!
> You are forgetting to factor in the bandwidth of the downloader. Even at
> gigabit speed (on an uncongested university  network), it would take me
> almost 450 hrs to download 200 TB, but most likely many times longer. At a
> more realistic download speed it would be impractically more.
> Talking of impractical, there is little reason one should be downloading
> 200 TB of raw data, and even less reason for many others wanting to do the
> same, if for no other reason than that not many have 200 TB of space lying
> around (yes, I understand you are using 200 TB for illustrative purposes,
> but replace that number with another, more reasonable big number and still
> the same arguments would apply). Hopefully there will be customizable
> analytical services that would allow creating one's own analytical insights
> into the data without needing to hoard it in entirety.
> Nevertheless, recouping for cost above and beyond what may be budgeted as
> part of the mission of the offering org sounds justified. The bottom line
> is, nothing is free even if it is open, and seems free. The only thing we
> don't want is double-charging.
> --
> Puneet Kishor
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: https://lists.okfn.org/mailman/options/open-science

Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20140517/5b2328a6/attachment-0003.html>

More information about the open-science mailing list