[ckan-dev] Dealing with large datasets plus slow loading resource pages

Philip Cross phil.cross at bristol.ac.uk
Fri May 23 09:19:48 UTC 2014


For interest, we have implemented a suggestion made on the list for
dealing with datasets containing large numbers of resources; where the
package pages for these datasets were loading very slowly. We have
created multiple packages per dataset: each package representing a
folder in the separate datastore we are pointing at and with the
packages linked via child-parent relationships.

The main issue we faced with this was the confusing number of results
that come up for the /datasets/ search so we had to introduce a 'top
level' metadata element to filter searches with by default. We also
had to introduce a caching mechanism to store the generated tree
structures you can see in the top level packages.

The repository is still not public but can be seen at:

http://databris-ui.ilrt.bris.ac.uk/

A prime example of a large dataset is:
http://databris-ui.ilrt.bris.ac.uk/dataset/13kidnrls4jnl1m806eyfd8h6z

We still have issues with folders that contain too many resources such as:
http://databris-ui.ilrt.bris.ac.uk/dataset/3dddbb60a6e97ec97b67af15e0ab36d9

where the page is taking about 50 sec to load.

There is an interesting further problem where the resource pages for
these large packages are also loading very slowly, e.g.

http://databris-ui.ilrt.bris.ac.uk/dataset/3dddbb60a6e97ec97b67af15e0ab36d9/resource/7d60d92c-e684-4619-87e7-16744772ea2a

- is this because the package is being loaded first in the background,
with all the other resources metadata?

We are still using version 2.0 but I'm assuming there wouldn't be a
speed improvement with 2.2.

Our solution helps but is not ideal and I think the issue of large
numbers of resources does still need addressing.

Cheers,
Phil


---------------------------------
Phil Cross
Senior Technical Researcher
IT Services R&D/ILRT
University of Bristol
8 - 10 Berkeley Square
Bristol, BS8 1HH
Tel: +44 (0)117 331 4391
Fax: +44 (0)117 331 4396
E-mail: phil.cross at bristol.ac.uk
URL: http://www.bris.ac.uk/ilrt/people/person/philip-a-cross
Skype: philip_cross

Please note I work for Bristol University on Tuesdays, Thursdays and Fridays
and I may not be able to respond to emails received on other days.
-----------------------------------------------



More information about the ckan-dev mailing list