[open-science] BioTorrents

John Wilbanks wilbanks at creativecommons.org
Sat Apr 17 22:28:59 UTC 2010

Also, be sure to include the Tranche Project in anything on this topic. I
think they're a long ways towards the goal.



On Sat, Apr 17, 2010 at 10:19 AM, Tom Moritz <tom.moritz at gmail.com> wrote:

> Sorry to be just jumping in without tracking regularly -- but there are
> some Stateside projects seeking to solve the problem that Rufus describes --
> if I'm not mistaken this is precisely what iRods (formerly at San Diego
> Supercomputer Center now migrated to Univ of North Carolina) has set out to
> address? [SEE: https://www.*irods*.org/ ]  and the NSF supported Tera-Grid
> has been grappling with this as well: SEE: https://www.teragrid.org/ ]
> and similarly the Open Science Grid [SEE:
> http://www.opensciencegrid.org/About/News_Archive/Open_Science_Grid_Receives_30_Million_Dollar_Award]
> I have been in some discvussions in past weeks and months with UNFCC, US
> EPA, and others about how best to manage at least foundational data sets
> ("canonical"?) while providing precisely the level of transparency and
> accountability that was obviously necessary in the recent IPCC dust-up...  I
> believe that we may be best off picking certain such data and thoroughly
> modeling best practice...???
> Tom
> Tom Moritz
> 1968 1/2 South Shenandoah Street,
> Los Angeles, California 90034
> +1 310 963 0199 (cell)
> tommoritz (Skype)
> http://www.linkedin.com/in/tmoritz
> “Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.)
> --Heraclitus
> Please consider the environment before printing this email.
> On Fri, Apr 16, 2010 at 1:41 PM, Rufus Pollock <rufus.pollock at okfn.org>wrote:
>> On 16 April 2010 20:03, Jonathan Gray <jonathan.gray at okfn.org> wrote:
>> > Piece about BioTorrents on Nature blog:
>> >
>> >
>> http://blogs.nature.com/news/thegreatbeyond/2010/04/improving_the_portability_of_d_1.html
>> There used to be something like this for geodata (geotorrents.org) but
>> it has now disappeared. We've thought about torrent stuff quite a lot
>> before for data distribution [1] [2]. The problem with bittorrent (at
>> least in the small-time experiments we did) is it provides no
>> mechanism to do the storage-allocation you need for a real data
>> "grid", specifically:
>> a) How do you deal with large files (GBs) which individual peers may
>> not want to be responsible for holding and sharing in their entirety.
>> The obvious answer is sharding but bittorrent has no way for sharding
>> parts of a given file (so you need some mechanism above bittorrent to
>> do this)
>> b) (the biggie) how do you file/load allocation and rebalancing to
>> ensure you don't lose data as peers enter and leave the network. Even
>> with lots of participants how do you decide what files to allocate to
>> whom and how do you coordinate changes over time so you don't end up
>> with everyone hosting the same 1 file!
>> What you really need here is proper wide-area distributed storage
>> solution. We tried to build something along these lines last year
>> running a tahoe grid: <http://grid.okfn.org/> (more info in [3],
>> requirements in [2])
>> I would still say this is a (long) way from success due to the various
>> social and technical issues involved:
>> * you need your grid software to be (very) easy to install
>> * you may have significant storage and b/w impositions on users
>> * (most significant) you really need *big* scale to provide
>> reliability and availability -- unlike with say distributed processing
>> projects where activity can happen any old time and people can
>> allocate their processing whenever they want. With a storage grid you
>> really need either a) massive scale b) strong commitment to
>> participation from peers, to avoid problem that groups of users going
>> offline or leaving the grid doesn't compromise availability.
>> [1]: http://wiki.okfn.org/p/Data_Distribution
>> [2]: http://wiki.okfn.org/p/Distributed_Storage
>> [2]: http://wiki.okfn.org/p/Distributed_Storage/Plan
>> > Would be interesting to liaise with them re: registry of open data
>> > (i.e. CKAN and suchlike...).
>> It would be no problem to create ckan packages linking to the relevant
>> torrents. Anyone with contacts with the biotorrents people so we could
>> have a chat?
>> Rufus
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20100417/3f2e47c1/attachment-0001.html>

More information about the open-science mailing list