[open-science] Big and open data – who should bear data transfer costs?

Peter Murray-Rust pm286 at cam.ac.uk
Sat May 17 15:12:58 UTC 2014


We can separate:
 - what is desirable
 - and what is the label

If we are using the label "Open" as in the Open Definition (
http://opendefinition.org/od/ ) then there is allowance for a reproduction
charge:

The Definition

A work is open if its manner of distribution satisfies the following
conditions:
1. Access

The work shall be available as a whole and at no more than a reasonable
reproduction cost, preferably downloading via the Internet without charge.
The work must also be available in a convenient and modifiable form.

*Comment: This can be summarized as ‘social’ openness – not only are you
allowed to get the work but you can get it. ‘As a whole’ prevents the
limitation of access by indirect means, for example by only allowing access
to a few items of a database at a time (material should be available in
bulk as necessary). Convenient and modifiable means that material should be
machine readable (rather than, for example, just human readable).*
- See more at: http://opendefinition.org/od/#sthash.J8YxD25l.dpuf

By this definition it is allowable to charge for reproduction and still
label the work "Open"

It can be further copied without charge.

P.



On Sat, May 17, 2014 at 4:03 PM, P Kishor <punk.kish at gmail.com> wrote:

>
> On Sat, May 17, 2014 at 7:51 AM, Lukasz Bolikowski <
> l.bolikowski at icm.edu.pl> wrote:
>
>> As an example, let's take the 1000 Genomes data set (
>> http://aws.amazon.com/datasets/4383) with over 200 TB of data available
>> on Amazon Web Services.  With transfer rate capped at 1 MB/s (not
>> unreasonable for the free plan in a freemium model) it would take over 6
>> years to download it.  Using BitTorrent could *somewhat* help the next
>> downloaders, but the first one would still have to wait over 6 years for
>> their download to complete!
>>
>
>
>
> You are forgetting to factor in the bandwidth of the downloader. Even at
> gigabit speed (on an uncongested university  network), it would take me
> almost 450 hrs to download 200 TB, but most likely many times longer. At a
> more realistic download speed it would be impractically more.
>
> Talking of impractical, there is little reason one should be downloading
> 200 TB of raw data, and even less reason for many others wanting to do the
> same, if for no other reason than that not many have 200 TB of space lying
> around (yes, I understand you are using 200 TB for illustrative purposes,
> but replace that number with another, more reasonable big number and still
> the same arguments would apply). Hopefully there will be customizable
> analytical services that would allow creating one's own analytical insights
> into the data without needing to hoard it in entirety.
>
> Nevertheless, recouping for cost above and beyond what may be budgeted as
> part of the mission of the offering org sounds justified. The bottom line
> is, nothing is free even if it is open, and seems free. The only thing we
> don't want is double-charging.
>
>
> --
> Puneet Kishor
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: https://lists.okfn.org/mailman/options/open-science
>
>


-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20140517/5b2328a6/attachment-0003.html>


More information about the open-science mailing list