[okfn-discuss] Distributed Storage: Suggestions?

julian at goatchurch.org.uk julian at goatchurch.org.uk
Thu Apr 23 14:29:07 UTC 2009


Quoting Lukasz Szybalski <szybalski at gmail.com>:

> I guess the question would be: Could you describe the
type of data you
> currently have. (percentage of space, downloads, changes)
>

This is the directory that has broken the system (watch
out-- it may break your browser):

http://ukparse.kforge.net/svn/undata/pdf/

It's several thousand large PDFs of UN documents.  The
same would apply to scanned images, archived pages from
Hansard, etc.  


At the moment I'm storing it in SVN as a means of
distribution, but it unnecessarily doubles the disk
useage, and some of the SVN clients are very unhappy
with the size of the directory.  


SVN is entirely inappropriate for these large binary
files (there are no versions), but it's convenient only
because the code that handles these binary files are in
SVN (where they belong), and the fewer means of
distribution the better.  But it's not scaling any more.  


We need a better answer for parking the data for these
projects, where we'd keep the scraping/parsing code in
SVN on kforge (SVN is designed for code), and handle
these large sets of large non-versioned files some other
way.



Julian.





More information about the okfn-discuss mailing list