[open-bibliography] [ANN] OFS - a python bitstream storage API w/ S3/pairtree/archive.org backends
Ben O'Steen
bosteen at gmail.com
Thu Sep 9 14:06:34 UTC 2010
http://openbiblio.net/2010/09/09/introducing-ofs-a-python-bucketobject-storage-library/
Many of the members of this list likely share the same problem of
storing bitstreams in an object- or uri-orientated manner so forgive me
for posting about this general API for storage here.
Blog post text follows:
-----------------------------------------------------------------
Many internally distributed storage systems – such as Amazon’s S3
service or Riak’s key-value architecture – have similarities in the
manner in which data is labelled and subsequently retrieved. This is
often because the systems themselves use a distributed hash table or a
similar distribution algorithm to disperse and then later find the data
they store.
OFS is a python library that seeks to capitalise on their similarities –
providing a single, general API to put and get files from one of these
services while hiding the specifics of the implementation from the user.
This allows for local testing and development before transitioning to
using one of the cloud services, services which typically cost real
money and slows down testing due to the necessity of communicating with
these services over an internet connection.
Characteristics of OFS:
* Uses a ‘bucket/label’ mechanism to identify individual files
* Provides a list of content in a given bucket (as best as that
the service can provide)
* Provides per-file metadata in so far as the service can provide
(key-value or JSON encode-able data)
* Current backend plugins:
* Local storage – based on the pairtree specification that
optimises file-distribution across a native file-system
to handle large quantities of files. Uses JSON to encode
arbitrary metadata about the files in a given bucket.
* Remote storage (S3 and Archive plugins written
by Friedrich Lindenberg (pudo) who has also made large
contributions to the codebase):
* Amazon S3
* Archive.org
* Riak (in progress)
* Also in progress – a REST Client by Friedrich Lindenberg
(pudo)
* One key desire is to provide opaque sharding – breaking
up very large files to spread across buckets or even
systems to improve performance and the range of services
or backend systems OFS can make use of.
It is plain that having the ability to write storage code in a common
way, but make use of local as well as remote ‘cloud’ storage is of a
great benefit. It encourages file storage to be codified in a
distribute-able manner so that scaling later on is easier.
This is a work in progress, but the local implementation is intended to
be both a reference implementation as well as useful testing or even
production backend for storage. Other backends potentially will have
less comprehensive metadata support for individual files, but these
‘limits’ will be included as optional warnings or exceptions once we
have a handle on what they are.
Please comment or give feedback on this library. Also, we would welcome
any patches for other backend support to the library!
http://bitbucket.org/okfn/ofs
More information about the open-bibliography
mailing list