[open-science] Big and open data – who should bear data transfer costs?

Sat May 17 09:48:43 UTC 2014

I'd say you can charge for bandwidth - as long as it's very clear that
people are not paying for the open license to use the data, and will be
able to do whatever they want with the dataset once they've obtained it.
They're paying for the transfer mechanism, like paying for CD-s loaded with
free and open source software.

Also, I think http://figshare.com/ allows sharing datasets. Not familiar
with them myself though.

Some ramblings on how you might do this in practice:

I foresee a difficulty in actually charging people. In effect you're
running a mini-business. Beyond the theoretical "could I charge" there is
the "how to charge" question. What would be the best way - ask for a small
donation before each download? But then you need to facilitate this.

Perhaps Paypal can help, seeing how their coverage is quite large. But
Paypal do not cover *all* of the world, so the payment required would shut
some people off. This could be acceptable to you if it's the only way to
make the dataset available.

Then you need to keep track of money coming in vs. your expenses for
hosting the dataset. Ideally they should match and it should all be quite
simple - but you would need to do it. Would it sap more and more of your
time if you host several datasets? Possibly. If you pay for server costs
through Paypal too, everything would be in one place, could save a lot of
time on keeping track of everything.

I guess if you're willing to put in the effort to set this up (as opposed
to just stick files on S3 or a university server and share the link) then
it would be fine. It's just a question of how long do you think it'll take
you to do it *this* way and whether you want to spend your time doing it.

Greetings,
Emanuil

On 17 May 2014 09:06, Lukasz Bolikowski <l.bolikowski at icm.edu.pl> wrote:

> Dear all,
>
> when compiling a list of big, open, publicly available, data sets for my
> students to use in their projects, I recently stumbled upon an interesting
> problem: as the cost of transferring a large data set from A to B is not
> negligible and someone has to bear that cost, what does "open" mean in case
> of "big data"?
>
> For example, Amazon Web Services offer a treasure trove of data sets, some
> on CC-BY or CC-BY-SA licenses:
>
>   http://aws.amazon.com/publicdatasets/
>
> Understandably, Amazon charges for data transfers out of its
> infrastructure.  When you rent Amazon's infrastructure in the same region
> in which the interesting data set is located, you're not charged for the
> transfer (but you are charged for the machines you use).
>
> In the recent rewrite of the Panton Principles website, initiated by
> Michelle Brook (http://goo.gl/cq1SuD) open research data is currently
> defined as "data [...] made available on the internet under licenses that
> permit anyone to download [...] without financial, legal, or technical
> barriers".
>
> The quoted sentence is careful to require lack of financial barriers only
> in the license, so charging for data transfers seems to be compatible with
> openness.
>
> A practical question: If, as a researcher or a research organization, I
> want to publish a large data set and keep the "open" label, can I charge
> for data transfers (plus amortization costs of data storage), or do I have
> to cover them myself?
>
> What are your thoughts?
>
> Best regards,
>
> Lukasz
>
> --
> Dr. Łukasz Bolikowski, Assistant Professor
> Centre for Open Science, ICM, University of Warsaw
> Contact details: http://www.icm.edu.pl/~bolo/
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/open-science
> Unsubscribe: https://lists.okfn.org/mailman/options/open-science
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20140517/23bab9aa/attachment-0003.html>