[okfn-discuss] torrents for sharing datasets

Stephen Welburn s.welburn at qmul.ac.uk
Tue Apr 14 09:54:39 UTC 2015


Using torrents for (research) data shifting was discussed (briefly) a
couple of years ago as part of the JISC (UK) research data management
programme.

Largely, the idea was that repositories should support a resumable,
large-data-friendly protocol for ingesting data... rather than trying to
force data ingests through an HTTP channel. The obvious solution designed
for this sort of use was torrents. The main worry was that sysadmins would
be less torrent friendly than data users - you (pretty-much) need to
either allow torrent traffic on your network or whitelist seeders. Other
aspects which are nicely dealt with include mirroring data between sites -
if you can get a federation of sites to agree to seed all torrents which
are added on the tracker.

Specifically, we thought that the SWORDv2 protocol for data ingest could
support pulling data from the client rather than the client being
responsible for pushing their data to the repository.

Video: https://vimeo.com/41849671
Slides: http://www.slideshare.net/shellac/sword2-and-bittorrent

Having said all that, it was an idea which we didn't get to take further.

Steve Welburn

--
Steve Welburn,
Research Consultant - IT Services Research
Queen Mary University of London
Tel: +44 (0)20 7882 6939
s.welburn at qmul.ac.uk




On 13/04/2015 16:35, "stef" <s at ctrlc.hu> wrote:

>ohi,
>
>i was discussing with a fellow about some data that i help them
>exploiting. i
>set up a few years ago some cronjob to archive daily changing urls.
>fantastic
>eu historical transparency data unavailable anywhere else, contains even
>references to the illuminati. however (oblig kraaken ref) we're no better
>than
>the kraaken, in publishing what we collect. i want to change that, i want
>to
>publish this. however the dataset is quite big, and i have other similar
>datasets that are in the few tens of GB ballpark. so i was thinking maybe
>a
>pirat^Wdatabay and some organizational seed-servers would be something
>that
>would be most efficient in data-sharing. my questions:
>
>1/ has this been considered already? if so, why has it been rejected?
>2/ has this been implemented, and i can join such an effort?
>
>my contrib: i'd be happy to share this dataset i was alluding to, also i'd
>share all my parltrack data, and i'd contribute at least one seed server
>for
>this venture.
>
>thx for any hints or ready-to-join initiatives,s
>
>-- 
>otr fp: https://www.ctrlc.hu/~stef/otr.txt
>_______________________________________________
>okfn-discuss mailing list
>okfn-discuss at lists.okfn.org
>https://lists.okfn.org/mailman/listinfo/okfn-discuss
>Unsubscribe: https://lists.okfn.org/mailman/options/okfn-discuss




More information about the okfn-discuss mailing list