[ckan-dev] Hierarchical folder structure for a dataset

Damian Steer D.Steer at bristol.ac.uk
Thu Aug 10 18:24:45 UTC 2017

> On 9 Aug 2017, at 06:29, Prashant Gupta <p.gupta at auckland.ac.nz> wrote:
> Hi,
> I am using CKAN to serve as an instrument (e.g. mass spec) data service, where we may ingest instrument data directly into CKAN – so we have a copy of raw data to be stored and shared, and later to be published and archived. The problem I am facing is the way CKAN stores its datasets and resources. For instrument data, it is vital to retain the folder structure and the resources (data, metadata and config files) to be in the correct folder. Otherwise the analysis software would have issues analysing it. 
> Is there a way CKAN may allow to store dataset and resources in a way that when it is downloaded, the folder structure may be retained somehow, and resources are in their correct folder?

Hi Prashant,

Very familiar story :-)

At the University of Bristol we had the same issue. We ended up using package relationships (parent / child)  to represent folder structures. You can see a fairly extreme example at [1].

(We only use CKAN as a catalogue - the data is held externally - and we have a tool that generates the packages using the ckan web ap)

Making it work required a fair amount of customisation:

* Tag top level packages as ‘level=top’ [2] so browsing works over the top levels rather than showing all subfolders.
* Generate and cache the tree you see in [1]. It can be expensive to generate.

Archiving is an option (and we do zip as well - see the ‘Complete download’ link), however it does obsfucate the dataset. For example you can search for ‘bedes’ [3] and find images of postcards. It also lets the user grab just the bits they need.

On the other hand we do recommend zipping (or probably 7zip in future) in cases where the individual files and directories don’t really make sense except as a whole. For example [4] contains a large number of images that represent slices through a sample. Individually they are very dull.

Hope this helps,

Damian Steer

[1] <https://data.bris.ac.uk/data/dataset/upjtf9os1dzr154phmgvrupib>
[2] <https://data.bris.ac.uk/data/dataset?level=top>
[3] <https://data.bris.ac.uk/data/dataset?q=bedes>
[4] <https://data.bris.ac.uk/data/dataset/37q0cntawxcq1rkktq3e9mr1p>


Damian Steer
Senior Technical Researcher
Research IT
+44 (0) 117 39 41724

More information about the ckan-dev mailing list