[open-science] BioTorrents

Jonathan Gray jonathan.gray at okfn.org
Thu Apr 22 09:39:53 UTC 2010


This looks great, John! I hadn't seen this before. Do you know anyone
from the project that we could invite to the WG on open data in
science?

All the best,

Jonathan

On Sat, Apr 17, 2010 at 11:28 PM, John Wilbanks
<wilbanks at creativecommons.org> wrote:
> Also, be sure to include the Tranche Project in anything on this topic. I
> think they're a long ways towards the goal.
> https://proteomecommons.org/tranche/
> jtw
>
> On Sat, Apr 17, 2010 at 10:19 AM, Tom Moritz <tom.moritz at gmail.com> wrote:
>>
>> Sorry to be just jumping in without tracking regularly -- but there are
>> some Stateside projects seeking to solve the problem that Rufus describes --
>> if I'm not mistaken this is precisely what iRods (formerly at San Diego
>> Supercomputer Center now migrated to Univ of North Carolina) has set out to
>> address? [SEE: https://www.irods.org/ ]  and the NSF supported Tera-Grid has
>> been grappling with this as well: SEE: https://www.teragrid.org/ ]
>> and similarly the Open Science Grid [SEE:
>> http://www.opensciencegrid.org/About/News_Archive/Open_Science_Grid_Receives_30_Million_Dollar_Award
>> ]
>>
>> I have been in some discvussions in past weeks and months with UNFCC, US
>> EPA, and others about how best to manage at least foundational data sets
>> ("canonical"?) while providing precisely the level of transparency and
>> accountability that was obviously necessary in the recent IPCC dust-up...  I
>> believe that we may be best off picking certain such data and thoroughly
>> modeling best practice...???
>>
>> Tom
>>
>> Tom Moritz
>> 1968 1/2 South Shenandoah Street,
>> Los Angeles, California 90034
>> USA
>> +1 310 963 0199 (cell)
>> tommoritz (Skype)
>> http://www.linkedin.com/in/tmoritz
>>
>> “Πάντα ῥεῖ καὶ οὐδὲν μένει” (Everything flows, nothing stands still.)
>> --Heraclitus
>>
>> Please consider the environment before printing this email.
>>
>>
>> On Fri, Apr 16, 2010 at 1:41 PM, Rufus Pollock <rufus.pollock at okfn.org>
>> wrote:
>>>
>>> On 16 April 2010 20:03, Jonathan Gray <jonathan.gray at okfn.org> wrote:
>>> > Piece about BioTorrents on Nature blog:
>>> >
>>> >
>>> >  http://blogs.nature.com/news/thegreatbeyond/2010/04/improving_the_portability_of_d_1.html
>>>
>>> There used to be something like this for geodata (geotorrents.org) but
>>> it has now disappeared. We've thought about torrent stuff quite a lot
>>> before for data distribution [1] [2]. The problem with bittorrent (at
>>> least in the small-time experiments we did) is it provides no
>>> mechanism to do the storage-allocation you need for a real data
>>> "grid", specifically:
>>>
>>> a) How do you deal with large files (GBs) which individual peers may
>>> not want to be responsible for holding and sharing in their entirety.
>>> The obvious answer is sharding but bittorrent has no way for sharding
>>> parts of a given file (so you need some mechanism above bittorrent to
>>> do this)
>>>
>>> b) (the biggie) how do you file/load allocation and rebalancing to
>>> ensure you don't lose data as peers enter and leave the network. Even
>>> with lots of participants how do you decide what files to allocate to
>>> whom and how do you coordinate changes over time so you don't end up
>>> with everyone hosting the same 1 file!
>>>
>>> What you really need here is proper wide-area distributed storage
>>> solution. We tried to build something along these lines last year
>>> running a tahoe grid: <http://grid.okfn.org/> (more info in [3],
>>> requirements in [2])
>>>
>>> I would still say this is a (long) way from success due to the various
>>> social and technical issues involved:
>>>
>>> * you need your grid software to be (very) easy to install
>>> * you may have significant storage and b/w impositions on users
>>> * (most significant) you really need *big* scale to provide
>>> reliability and availability -- unlike with say distributed processing
>>> projects where activity can happen any old time and people can
>>> allocate their processing whenever they want. With a storage grid you
>>> really need either a) massive scale b) strong commitment to
>>> participation from peers, to avoid problem that groups of users going
>>> offline or leaving the grid doesn't compromise availability.
>>>
>>> [1]: http://wiki.okfn.org/p/Data_Distribution
>>> [2]: http://wiki.okfn.org/p/Distributed_Storage
>>> [2]: http://wiki.okfn.org/p/Distributed_Storage/Plan
>>>
>>> > Would be interesting to liaise with them re: registry of open data
>>> > (i.e. CKAN and suchlike...).
>>>
>>> It would be no problem to create ckan packages linking to the relevant
>>> torrents. Anyone with contacts with the biotorrents people so we could
>>> have a chat?
>>>
>>> Rufus
>>>
>>> _______________________________________________
>>> open-science mailing list
>>> open-science at lists.okfn.org
>>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>>
>> _______________________________________________
>> open-science mailing list
>> open-science at lists.okfn.org
>> http://lists.okfn.org/mailman/listinfo/open-science
>>
>
>
> _______________________________________________
> open-science mailing list
> open-science at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/open-science
>
>



-- 
Jonathan Gray

Community Coordinator
The Open Knowledge Foundation
http://blog.okfn.org

http://twitter.com/jwyg
http://identi.ca/jwyg




More information about the open-science mailing list