[okfn-discuss] OKF grid, Tahoe-LAFS, Cassandra, MongoDB

Rufus Pollock rufus.pollock at okfn.org
Thu Oct 7 16:28:52 UTC 2010

On 6 October 2010 07:40, Zooko O'Whielacronx <zooko at zooko.com> wrote:
> Dear Rufus Pollock and other OKF folks:

Thanks for writing Zooko -- and let me re-iterate I have nothing but
admiration for the sterling work you guys have been doing. Anyway to
comments on particular points ...


> I've reviewed the discussion that Rufus started on the tahoe-dev
> mailing list nine months ago [1]. Back then I thought that what Rufus
> was asking for sounded reasonable enough, and much of it seemed
> definitely doable, but for some of it I wasn't really sure of the
> details—what specifically was required and if it was a reasonable
> thing to want or if it was even possible to implement it all. I'm

Absolutely -- also, my impression was storage accounting was a fairly
big job ...

> still not entirely sure today, and I'm interested in seeing how some
> other tools such as MongoDB provide for OKF's needs. If it can, then
> that example can show me how Tahoe-LAFS can be used likewise. If it
> can't, then this gives me increased confidence that the original
> desiderata for the OKF grid were too strong.

Excellent point -- I was talking with an ex-researcher in wide-area
distributed storage a few months ago and he basically said: this is
hard problem and no-one has solved it yet (not necessarily hard
technically but socially - having enough people participating to
ensure a stable grid). For more on these challenges see:

Current spec of overall requirements is (from

There is an addressable file-space (e.g. a virtual file-system) which
is distributed over multiple machines (nodes). Key features:

  * '''Wide area''': we have a preference for a wide-area system, i.e.
we do not expect all the nodes to be in a single data-centre or on a
single high-speed network but rather to be distributed across the
    * Even a single data-centre solution would be interesting though
  * '''Robustness''': data must not be lost if a given node (or even
k) nodes disappear
    * This implies replication, i.e. data must be automatically
replicated across nodes
  * '''Easy addition of nodes''': it should be easy for an average
sysadmin to install and configure a node (e.g. debian package should
be available)
    * We want people to be able to easily "donate" nodes
  * '''Share/shard-rebalancing''': should have good re-balancing to
handle (permanent) node entry and exit
  * '''Different file sizes''': the system should be able to handle
small and very large files (so files should be automatically sharded)
  * '''Availability''': high guarantee of data availability (so the
disappearance of a given node)
  * '''Open data focused''': focused on data/content that is
[[http://opendefinition.org/|open]] so encryption/privacy is '''not'''
a priority
  * '''F/OSS''': must be free/open source software so we can build
[[http://opendefinition.org/ossd|open services]]
  * '''Eventually consistent''': Concurrency/Consistency is not
required as long as eventually consistent (we know our CAP)

> In this note I'll talk about first encryption and then space accounting.
> Let's tackle the issue of encryption, because I think it is kind of a
> red herring and I hope to get it out of the way and concentrate on the
> really hard issues. Tahoe-LAFS's encryption can be understood as:

I mostly agree (my point was always about usability not about any
flaws in the model) so let's assume complete agreement and I'll snip
this section.


> Next, let's talk about the "space accounting" issue. This one I
> definitely understand as being a reasonable thing to want and a thing
> that could be feasibly implemented. Let's distinguish between two
> goals:
> Goal 1: I want to allow users to read (download) files without thereby
> allowing them to write (upload) them.


> Goal 2: I want to allow server operators to contribute space on their
> storage server without thereby allowing them to consume space on other
> storage servers.

Yes. Though the two relate. Given the p2p nature of tahoe (if I
understand correctly) if someone else starts a node and joins the
network and allows upload on *that* node that content will propagate
to other nodes. I guess the answer is that node owners should shut
down write access except through that main proxy.

> Goal 1 is already possible using an HTTP proxy in front of the
> Tahoe-LAFS gateway. This is already done in practice, as recently
> discussed on the tahoe-dev list [2].

That's what we also implemented with
<http://knowledgeforge.net/okfn/grid/> (I apologize for not announcing
that on tahoe-dev but it was, and is, rather alpha ...)

> Goal 2 is much trickier. To allow goal 2, as has been mentioned on
> this thread, Tahoe-LAFS developers have a plan to add strong
> distributed space accounting in the future, which plan we haven't made
> much progress on in the last nine months.
> What interests me for the OKF grid is: what are the alternatives? From
> my experience using Cassandra I'm pretty sure that it is even less

We don't have much interest in Cassandra ...

> capable than Tahoe-LAFS is at goal 2, and it can be served up behind
> an HTTP proxy just as well as Tahoe-LAFS can. I would assume (without
> knowing much) that the same goes for MongoDB and couchdb and every
> other system on the planet. :-)

Yes, you are quite correct that in any system that doesn't "tag" the
owner/source node of a given object bit of data in the system.

However, there is one difference with Tahoe I believe (if i remember
correctly and matters haven't changed): in Tahoe someone can upload
files and fail to make the readcap available. I also believe they can
upload to a new root node they create in which I won't even see this
node if I 'walk' the filesystem.

In other systems if someone uploads content "I" will definitely be
able to see it -- and can, for example, enforce a policy such as: any
piece of content without a valid owner field will be deleted.

> So in sum, Tahoe-LAFS already allows goal 1 and is actually used that
> way in practice, and Tahoe-LAFS might in the future (especially if
> someone else pitches in and helps) achieve goal 2, which no other
> current system to my knowledge can offer either.

Agreed modulo major caveat.

> Oh, we should really think about another goal which wasn't explicitly
> mentioned before but which is probably actually very important:
> Goal 3: I want to allow server operators to contribute space on their
> storage server without thereby allowing them to overwrite or delete
> files on other storage servers.


> Tahoe-LAFS already offers goal 3, and I'm pretty sure that it is the
> only system that offers goal 3 and the only one that is likely to in
> the near future. (I would love to be proven wrong.)

You also offer: sharding and share-rebalancing (with some others do
too but are a major challenge)!

> Okay, so now that I've sat down and written this letter, it sounds to
> me like maybe Tahoe-LAFS is a reasonable tool for OKF to move forward
> with after all. Or at least, it isn't that much more unreasonable than
> any alternative that I know of. ;-)

Yes, you've definitely made it clear we should go revisit this and see
what we can do.

> I'm sorry that I didn't figure this out and write this letter nine
> months ago when you first asked, but honestly, I was uncertain. In the
> time that has passed since then I've learned a lot and gotten familiar
> with Cassandra. It wasn't until I actually wrote this letter that I
> thought things through in these terms.

Thank-you very much for taking the time to write :) and look forward
to your responses to some of my queries above.



> [1] http://tahoe-lafs.org/pipermail/tahoe-dev/2009-June/001985.html
> [2] http://tahoe-lafs.org/pipermail/tahoe-dev/2010-October/005336.html
> _______________________________________________
> okfn-discuss mailing list
> okfn-discuss at lists.okfn.org
> http://lists.okfn.org/mailman/listinfo/okfn-discuss

Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

More information about the okfn-discuss mailing list