[okfn-dev] [Nipy-devel] data package source
Satrajit Ghosh
satra at mit.edu
Wed Dec 8 19:17:35 UTC 2010
hi matthew and others,
Regression testing might require large real data-sets and we should be
careful about the size of the data-set that this package will provide.
one option is to consider the use of xnat as an alternative for storing the
data ( on central.xnat.org) and the data-pkg will simply use pyxnat to
query/retrieve relevant data to the local machine. (i'm cc:ing yannick).
please also consider the connectome file-format as a way of describing
stored data.
cheers,
satra
On Tue, Dec 7, 2010 at 10:26 AM, Matthew Brett <matthew.brett at gmail.com>wrote:
> Hi Rufus and all,
>
> We (nipy folks [1], and neurodebian folks [2], and maybe others) have been
> thinking a little bit about what we wanted from a data package
> implementation.
>
> First - an apology. I have tried to explore datapkg, but rather
> superficially.
> What we've done, in the main, is to try and think out what we mean by
> stuff, and
> what we want, and we're slowly then coming back to what y'all have done.
>
> In our first implementation of data packages, before we knew about datapkg,
> we
> did something extremely simple (but nevertheless not very good). If you're
> interested, the implementation is in nibabel [3]. After we'd
> half-heartedly
> used that for a while, it became obvious that it was too clumsy and a
> little
> difficult to understand, for the simple case where you want to unpack files
> somewhere and point the code at the files.
>
> Now we're thinking what we really want. The result of various discussions
> ended
> up in the attached document ``data_pkg_discuss.rst``. As the name
> suggests,
> it's trying to clarify various ideas we had about what is what.
>
> Now onto something real, usecases...
>
> We have - for example - a smallish package for reading image data -
> nibabel. We
> want to be able to use optional data packages from within nibabel. In
> particular, we wanted packages of test data of images in various formats,
> that
> are too large to include in the code repository. Here's some things we
> wanted:
>
> * No dependency for nibabel on the data packaging code. That is, we wanted
> to
> be able to *use* installed data packages without having to install - say
> ``datapkg``. This is obviously not essential, but desirable. We're less
> concerned about having to depend on - say - ``datapkg`` for installing the
> data, or modifying the data packages. Having said that, it would surely
> help
> adoption of a standard packaging system if it was easy to implement a
> packaging protocol outside of the canonical implementation in - say -
> ``datapkg``.
> * Support for data package versions. We expect to have several versions of
> nibabel out in the wild, and maybe several versions of nibabel on a single
> machine. The versions of nibabel may well need different versions of the
> data
> packages to run their tests. Even if there is just one version on the
> computer, it might be an older version that wants a version of the data
> package that is older than the current version. Thus we want to be able
> to
> ask for different versions of a data package, and to be able to have
> several
> versions of package installed at any one time
> * Support for user and system installs of data. As for python package
> installs,
> we expect some of our packages to be installed system-wide and available
> for
> all users, and others to be installed just for a single user. We want to
> be
> able install data with the same distinction, so that system-wide packages
> can
> see system-wide data. It should be possible for an individual piece of
> code
> to find an individual data package, whether it is installed system-wide,
> or
> only for the user.
> * Not of urgent importance for us, but it would be good to be able
> sign the packages
> with a trusted key, as for Debian packages.
>
> For these various reasons we tried to spec out what we thought we would
> need in
> the attached ``data_pkg_uses.rst``. I've also attached a script referenced
> in
> that page, ``register_me.py`` - as ``register_me.txt``.
>
> Given my relative ignorance of ``datapkg``, I'll try to say the differences
> I
> see from the current ``datapkg``:
>
> * I can't see support for data package versioning in ``datapkg`` - but I
> might
> have missed it.
> * As far as I can see, there isn't a separation of system and user
> installs, in
> that there seems to be a (by default) sqlite 'repository' (right term?)
> that
> knows about the packages a user has installed, but I could not find an
> obvious canonical way to pool system and user installation information.
> Is
> that right?
> * Because the default repository is sqlite, anyone trying to read the
> installations that ``datapkg`` did, will need sqlite or something similar.
> They'll likely have this if they are using a default python installation,
> but
> not necessarily if they are using another language or a custom python
> install.
>
> Are these right?. Do our usecases make sense to y'all?
>
> We'd love to work together on stuff if that makes sense to you too...
>
> See you,
>
>
> Matthew (for various of us).
>
>
>
> [1] http://nipy.org
> [2] http://neuro.debian.net
> [3] http://nipy.org/nibabel/devel/data_pkg_design.html
>
> _______________________________________________
> Nipy-devel mailing list
> Nipy-devel at neuroimaging.scipy.org
> http://mail.scipy.org/mailman/listinfo/nipy-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/okfn-labs/attachments/20101208/10769e1a/attachment-0001.html>
More information about the okfn-labs
mailing list