[okfn-dev] [Nipy-devel] data package source

Satrajit Ghosh satra at mit.edu
Wed Dec 8 19:17:35 UTC 2010


hi matthew and others,

Regression testing might require large real data-sets and we should be
careful about the size of the data-set that this package will provide.

one option is to consider the use of xnat as an alternative for storing the
data ( on central.xnat.org) and the data-pkg will simply use pyxnat to
query/retrieve relevant data to the local machine. (i'm cc:ing yannick).
please also consider the connectome file-format as a way of describing
stored data.

cheers,

satra



On Tue, Dec 7, 2010 at 10:26 AM, Matthew Brett <matthew.brett at gmail.com>wrote:

> Hi Rufus and all,
>
> We (nipy folks [1], and neurodebian folks [2], and maybe others) have been
> thinking a little bit about what we wanted from a data package
> implementation.
>
> First - an apology.  I have tried to explore datapkg, but rather
> superficially.
> What we've done, in the main, is to try and think out what we mean by
> stuff, and
> what we want, and we're slowly then coming back to what y'all have done.
>
> In our first implementation of data packages, before we knew about datapkg,
> we
> did something extremely simple (but nevertheless not very good).  If you're
> interested, the implementation is in nibabel [3].  After we'd
> half-heartedly
> used that for a while, it became obvious that it was too clumsy and a
> little
> difficult to understand, for the simple case where you want to unpack files
> somewhere and point the code at the files.
>
> Now we're thinking what we really want.  The result of various discussions
> ended
> up in the attached document ``data_pkg_discuss.rst``.  As the name
> suggests,
> it's trying to clarify various ideas we had about what is what.
>
> Now onto something real, usecases...
>
> We have - for example - a smallish package for reading image data -
> nibabel.  We
> want to be able to use optional data packages from within nibabel.  In
> particular, we wanted packages of test data of images in various formats,
> that
> are too large to include in the code repository.  Here's some things we
> wanted:
>
> * No dependency for nibabel on the data packaging code.  That is, we wanted
> to
>  be able to *use* installed data packages without having to install - say
>  ``datapkg``.  This is obviously not essential, but desirable.  We're less
>  concerned about having to depend on - say - ``datapkg`` for installing the
>  data, or modifying the data packages.  Having said that, it would surely
> help
>  adoption of a standard packaging system if it was easy to implement a
>  packaging protocol outside of the canonical implementation in - say -
>  ``datapkg``.
> * Support for data package versions.  We expect to have several versions of
>  nibabel out in the wild, and maybe several versions of nibabel on a single
>  machine.  The versions of nibabel may well need different versions of the
> data
>  packages to run their tests.  Even if there is just one version on the
>  computer, it might be an older version that wants a version of the data
>  package that is older than the current version.  Thus we want to be able
> to
>  ask for different versions of a data package, and to be able to have
> several
>  versions of package installed at any one time
> * Support for user and system installs of data. As for python package
> installs,
>  we expect some of our packages to be installed system-wide and available
> for
>  all users, and others to be installed just for a single user.  We want to
> be
>  able install data with the same distinction, so that system-wide packages
> can
>  see system-wide data.  It should be possible for an individual piece of
> code
>  to find an individual data package, whether it is installed system-wide,
> or
>  only for the user.
> * Not of urgent importance for us, but it would be good to be able
> sign the packages
>  with a trusted key, as for Debian packages.
>
> For these various reasons we tried to spec out what we thought we would
> need in
> the attached ``data_pkg_uses.rst``.  I've also attached a script referenced
> in
> that page, ``register_me.py`` - as ``register_me.txt``.
>
> Given my relative ignorance of ``datapkg``, I'll try to say the differences
> I
> see from the current ``datapkg``:
>
> * I can't see support for data package versioning in ``datapkg`` - but I
> might
>  have missed it.
> * As far as I can see, there isn't a separation of system and user
> installs, in
>  that there seems to be a (by default) sqlite 'repository' (right term?)
> that
>  knows about the packages a user has installed, but I could not find an
>  obvious canonical way to pool system and user installation information.
>  Is
>  that right?
> * Because the default repository is sqlite, anyone trying to read the
>  installations that ``datapkg`` did, will need sqlite or something similar.
>  They'll likely have this if they are using a default python installation,
> but
>  not necessarily if they are using another language or a custom python
> install.
>
> Are these right?.  Do our usecases make sense to y'all?
>
> We'd love to work together on stuff if that makes sense to you too...
>
> See you,
>
>
> Matthew (for various of us).
>
>
>
> [1] http://nipy.org
> [2] http://neuro.debian.net
> [3] http://nipy.org/nibabel/devel/data_pkg_design.html
>
> _______________________________________________
> Nipy-devel mailing list
> Nipy-devel at neuroimaging.scipy.org
> http://mail.scipy.org/mailman/listinfo/nipy-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20101208/10769e1a/attachment-0001.html>


More information about the okfn-labs mailing list