[open-science] BioTorrents

Rufus Pollock rufus.pollock at okfn.org
Thu Apr 22 11:07:26 UTC 2010

On 17 April 2010 18:19, Tom Moritz <tom.moritz at gmail.com> wrote:
> Sorry to be just jumping in without tracking regularly -- but there are some
> Stateside projects seeking to solve the problem that Rufus describes -- if
> I'm not mistaken this is precisely what iRods (formerly at San Diego
> Supercomputer Center now migrated to Univ of North Carolina) has set out to
> address? [SEE: https://www.irods.org/ ]  and the NSF supported Tera-Grid has

Thanks a lot for these links Tom. I'd seen iRods before (I've also
just added it to [1]). When I last tried it out 1y+ ago the install
was non-trivial and I couldn't imagine getting a "volunteer" grid
going on this basis (it was unlikely the average dedicated server
owner was going to get through that setup!).

[1]: <http://wiki.okfn.org/p/Distributed_Storage/Research>

Perhaps it is worth explicitly listing the requirements we put
together (listed on [1]):

  1. Robustness via replication across nodes
  2. Easy addition of nodes -- in particular we wish people to be able
to easily "donate" nodes
  3. Require good share/shard-rebalancing as nodes enter and leave network
  4. The system should be able to handle small and very large files
(so files should be automatically sharded)
  5. Concurrency/Consistency is not a big issue
  6. Availability is a big issue
  7. Versioning would be nice (though what exactly would this mean?)
  8. Data stored would be open so encryption/privacy is not a priority

> been grappling with this as well: SEE: https://www.teragrid.org/ ]
> and similarly the Open Science Grid [SEE:
> http://www.opensciencegrid.org/About/News_Archive/Open_Science_Grid_Receives_30_Million_Dollar_Award
> ]

These both seem to be services rather than "software" -- I may have
missed something but I couldn't see that the software behind e.g. open
science grid open source and available for download?

> I have been in some discvussions in past weeks and months with UNFCC, US
> EPA, and others about how best to manage at least foundational data sets
> ("canonical"?) while providing precisely the level of transparency and
> accountability that was obviously necessary in the recent IPCC dust-up...  I
> believe that we may be best off picking certain such data and thoroughly
> modeling best practice...???

Indeed. I certainly think it would be interesting to find out more
what is on offer and *in particular* people's actual experience using
that software or service -- perhaps updates to

Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

More information about the open-science mailing list