[open-science] BioTorrents

Tom Moritz tom.moritz at gmail.com
Sat Apr 17 17:19:12 UTC 2010


Sorry to be just jumping in without tracking regularly -- but there are some
Stateside projects seeking to solve the problem that Rufus describes -- if
I'm not mistaken this is precisely what iRods (formerly at San Diego
Supercomputer Center now migrated to Univ of North Carolina) has set out to
address? [SEE: https://www.*irods*.org/ ]  and the NSF supported Tera-Grid
has been grappling with this as well: SEE: https://www.teragrid.org/ ]
and similarly the Open Science Grid [SEE:
http://www.opensciencegrid.org/About/News_Archive/Open_Science_Grid_Receives_30_Million_Dollar_Award]

I have been in some discvussions in past weeks and months with UNFCC, US
EPA, and others about how best to manage at least foundational data sets
("canonical"?) while providing precisely the level of transparency and
accountability that was obviously necessary in the recent IPCC dust-up...  I
believe that we may be best off picking certain such data and thoroughly
modeling best practice...???

Tom

Tom Moritz
1968 1/2 South Shenandoah Street,
Los Angeles, California 90034
USA
+1 310 963 0199 (cell)
tommoritz (Skype)
http://www.linkedin.com/in/tmoritz

“Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.)
--Heraclitus

Please consider the environment before printing this email.


On Fri, Apr 16, 2010 at 1:41 PM, Rufus Pollock <rufus.pollock at okfn.org>wrote:

> On 16 April 2010 20:03, Jonathan Gray <jonathan.gray at okfn.org> wrote:
> > Piece about BioTorrents on Nature blog:
> >
> >
> http://blogs.nature.com/news/thegreatbeyond/2010/04/improving_the_portability_of_d_1.html
>
> There used to be something like this for geodata (geotorrents.org) but
> it has now disappeared. We've thought about torrent stuff quite a lot
> before for data distribution [1] [2]. The problem with bittorrent (at
> least in the small-time experiments we did) is it provides no
> mechanism to do the storage-allocation you need for a real data
> "grid", specifically:
>
> a) How do you deal with large files (GBs) which individual peers may
> not want to be responsible for holding and sharing in their entirety.
> The obvious answer is sharding but bittorrent has no way for sharding
> parts of a given file (so you need some mechanism above bittorrent to
> do this)
>
> b) (the biggie) how do you file/load allocation and rebalancing to
> ensure you don't lose data as peers enter and leave the network. Even
> with lots of participants how do you decide what files to allocate to
> whom and how do you coordinate changes over time so you don't end up
> with everyone hosting the same 1 file!
>
> What you really need here is proper wide-area distributed storage
> solution. We tried to build something along these lines last year
> running a tahoe grid: <http://grid.okfn.org/> (more info in [3],
> requirements in [2])
>
> I would still say this is a (long) way from success due to the various
> social and technical issues involved:
>
> * you need your grid software to be (very) easy to install
> * you may have significant storage and b/w impositions on users
> * (most significant) you really need *big* scale to provide
> reliability and availability -- unlike with say distributed processing
> projects where activity can happen any old time and people can
> allocate their processing whenever they want. With a storage grid you
> really need either a) massive scale b) strong commitment to
> participation from peers, to avoid problem that groups of users going
> offline or leaving the grid doesn't compromise availability.
>
> [1]: http://wiki.okfn.org/p/Data_Distribution
> [2]: http://wiki.okfn.org/p/Distributed_Storage
> [2]: http://wiki.okfn.org/p/Distributed_Storage/Plan
>
> > Would be interesting to liaise with them re: registry of open data
> > (i.e. CKAN and suchlike...).
>
> It would be no problem to create ckan packages linking to the relevant
> torrents. Anyone with contacts with the biotorrents people so we could
> have a chat?
>
> Rufus
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/open-science/attachments/20100417/a7d26cf3/attachment-0001.html>


More information about the open-science mailing list