[okfn-dev] data package source

Matthew Brett matthew.brett at gmail.com
Tue Dec 7 15:26:42 UTC 2010


Hi Rufus and all,

We (nipy folks [1], and neurodebian folks [2], and maybe others) have been
thinking a little bit about what we wanted from a data package implementation.

First - an apology.  I have tried to explore datapkg, but rather superficially.
What we've done, in the main, is to try and think out what we mean by stuff, and
what we want, and we're slowly then coming back to what y'all have done.

In our first implementation of data packages, before we knew about datapkg, we
did something extremely simple (but nevertheless not very good).  If you're
interested, the implementation is in nibabel [3].  After we'd half-heartedly
used that for a while, it became obvious that it was too clumsy and a little
difficult to understand, for the simple case where you want to unpack files
somewhere and point the code at the files.

Now we're thinking what we really want.  The result of various discussions ended
up in the attached document ``data_pkg_discuss.rst``.  As the name suggests,
it's trying to clarify various ideas we had about what is what.

Now onto something real, usecases...

We have - for example - a smallish package for reading image data - nibabel.  We
want to be able to use optional data packages from within nibabel.  In
particular, we wanted packages of test data of images in various formats, that
are too large to include in the code repository.  Here's some things we wanted:

* No dependency for nibabel on the data packaging code.  That is, we wanted to
  be able to *use* installed data packages without having to install - say
  ``datapkg``.  This is obviously not essential, but desirable.  We're less
  concerned about having to depend on - say - ``datapkg`` for installing the
  data, or modifying the data packages.  Having said that, it would surely help
  adoption of a standard packaging system if it was easy to implement a
  packaging protocol outside of the canonical implementation in - say -
  ``datapkg``.
* Support for data package versions.  We expect to have several versions of
  nibabel out in the wild, and maybe several versions of nibabel on a single
  machine.  The versions of nibabel may well need different versions of the data
  packages to run their tests.  Even if there is just one version on the
  computer, it might be an older version that wants a version of the data
  package that is older than the current version.  Thus we want to be able to
  ask for different versions of a data package, and to be able to have several
  versions of package installed at any one time
* Support for user and system installs of data. As for python package installs,
  we expect some of our packages to be installed system-wide and available for
  all users, and others to be installed just for a single user.  We want to be
  able install data with the same distinction, so that system-wide packages can
  see system-wide data.  It should be possible for an individual piece of code
  to find an individual data package, whether it is installed system-wide, or
  only for the user.
* Not of urgent importance for us, but it would be good to be able
sign the packages
  with a trusted key, as for Debian packages.

For these various reasons we tried to spec out what we thought we would need in
the attached ``data_pkg_uses.rst``.  I've also attached a script referenced in
that page, ``register_me.py`` - as ``register_me.txt``.

Given my relative ignorance of ``datapkg``, I'll try to say the differences I
see from the current ``datapkg``:

* I can't see support for data package versioning in ``datapkg`` - but I might
  have missed it.
* As far as I can see, there isn't a separation of system and user installs, in
  that there seems to be a (by default) sqlite 'repository' (right term?) that
  knows about the packages a user has installed, but I could not find an
  obvious canonical way to pool system and user installation information.  Is
  that right?
* Because the default repository is sqlite, anyone trying to read the
  installations that ``datapkg`` did, will need sqlite or something similar.
  They'll likely have this if they are using a default python installation, but
  not necessarily if they are using another language or a custom python install.

Are these right?.  Do our usecases make sense to y'all?

We'd love to work together on stuff if that makes sense to you too...

See you,


Matthew (for various of us).



[1] http://nipy.org
[2] http://neuro.debian.net
[3] http://nipy.org/nibabel/devel/data_pkg_design.html
-------------- next part --------------
from os.path import join as pjoin, expanduser, abspath, dirname
import sys
# Python 3 compatibility
try:
    import configparser as cfp
except ImportError:
    import ConfigParser as cfp

if sys.platform == 'win32':
    HOME_INI = pjoin(expanduser('~'), '_dpkg', 'local.dsource')
else:
    HOME_INI = pjoin(expanduser('~'), '.dpkg', 'local.dsource')
SYS_INI = pjoin(abspath('etc'), 'dpkg', 'local.dsource')
OUR_PATH = dirname(__file__)
OUR_META = pjoin(OUR_PATH, 'meta.ini')
DISCOVER_INIS = {'user': HOME_INI, 'system': SYS_INI}

def main():
    # Get ini file to which to write
    try:
        reg_to = sys.argv[1]
    except IndexError:
        reg_to = 'user'
    if reg_to in ('user', 'system'):
        ini_fname = DISCOVER_INIS[reg_to]
    else: # it is an ini file name
        ini_fname = reg_to

    # Read parameters for our distribution
    meta = cfp.ConfigParser()
    files = meta.read(OUR_META)
    if len(files) == 0:
        raise RuntimeError('Missing meta.ini file')
    name = meta.get('DEFAULT', 'name')
    version = meta.get('DEFAULT', 'version')

    # Write into ini file
    dsource = cfp.ConfigParser()
    dsource.read(ini_fname)
    if not dsource.has_section(name):
        dsource.add_section(name)
    dsource.set(name, version, OUR_PATH)
    dsource.write(file(ini_fname, 'wt'))

    print 'Registered package %s, %s to %s' % (name, version, ini_fname)


if __name__ == '__main__':
    main()
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data_pkg_uses.rst
Type: application/octet-stream
Size: 8366 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20101207/fade2826/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: data_pkg_discuss.rst
Type: application/octet-stream
Size: 9872 bytes
Desc: not available
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20101207/fade2826/attachment-0005.obj>


More information about the okfn-labs mailing list