[okfn-discuss] data distribution

Rufus Pollock rufus.pollock at okfn.org
Wed Jan 18 11:43:16 UTC 2006


Francis Irving wrote:
> What specific data have you got that is so large, and who needs access
> to it?

Nothing particular at present but in terms of the future I was thinking 
of geodata, or larges amount of text (I believe you thought one day 
there might be 2gig of publicwhip data). I also was interested just from 
the design perspective: if you were to set up a data distribution today 
what would you do?

> I'm doubtful that Bittorrent will be helpful - I believe it only works
> if lots of people want to download the same data, which is unlikely
> with the sort of sets we'll be working on.

With no additional downloaders will BT not be approximately the same as 
a straight 1-to-1 normal download (with marginal overhead for hashing 
etc). Of course it will be more complex for the client ... (though you 
get restarting for free)

> I suspect traditional download mechanisms are fine. Broadband is
> getting much faster now, and you can download the amount of data on a
> CD relatively quickly. Certainly, download speed hasn't been a
> limiting factor for anything in relation to parlparse - processor 
> speed and RAM more important, and that only because our parsing
> scripts aren't even remotely optimised.

Good point. I suppose the main advantages are:
   1. scales to large files (what about a 4gig file or even a 20 gig 
file of geodata)
   2. easy to do mirroring (just get other people to leave their BT 
client on (permanently or for a good period) after download to allow 
others to download from them. This is already encouraged as being polite 
but it also allows for simple mirroring and download sharing
   3. defaults to straight download if only one seed

Disadvantages:
   1. More complex for client
   2. More complex for server (marginally so)

The main issue is the complexity for the client. Given that you have got 
to have the file on disk anyway (for the first seed) you could always 
add in http access as a fall back.

Regards,

Rufus

> On Thu, Jan 12, 2006 at 07:23:18PM +0000, Rufus Pollock wrote:
> 
>>Recently I've been thinking about distributing data, especially large 
>>amounts of it (I've started an incubator project on OKFN [1]). Recent 
>>trends in distributing software seem to be towards using a p2p protocol 
>>such as bittorrent, though it is noticeable that most software download 
>>is still fairly traditional in being straight file transfer (either via 
>>http or ftp).
>>
>>Given that for 'knowledge' we are often talking about chunks of data 
>>that are sizeable compared to normal software (perhaps equivalent to a 
>>linux distro or bigger) it seems sensible to go down this route. I don't 
>>yet know much about this and my experience with BT only extends to 
>>downloading a few distros -- I have never distributed using it -- so I 
>>would welcome any comments people had about this (I have read [2]). Some 
>>of the things i'd particularly like information on are:
>>
>>  1. Mirroring. By the nature of BT I assume you can't explicitly 
>>select a mirror as you often do when downloading in the traditional 
>>manner so to what extent does BT do mirroring for you?
>>
>>  2. Is there any way to do fallback in BT so that clients which can't 
>>support BT can download normally? If not is this an issue?
>>
>>  3. Any tips on setting up a tracker (or should one use trackerless 
>>torrents)?
>>
>>[1] http://www.okfn.org/wiki/DataDistribution
>>[1] http://www.bittorrent.com/guide.html
>>
>>_______________________________________________
>>okfn-discuss mailing list
>>okfn-discuss at lists.okfn.org
>>http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss
>>
> 
> 




More information about the okfn-discuss mailing list