[ckan-dev] performance, caching and authz.

David Raznick kindly at gmail.com
Tue Jun 14 09:14:33 UTC 2011


Hello All

I have been steadily been looking at performance in ckan and I now have a
decent sense of what areas need to be looked at.

My testing has mainly been around synchronous calls based around requests
that have been coming into ckan.net.  I have taken a sample of 1000 of them.
For the time being I have excluded api requests as they are in essence
easily cachable.   I have only taken request from behind the proxy cache,
this is in order to get a better spread of possible requests.  I want to
improve performance in the worst case.

Normal run.  No cache.
394 seconds

I could not get any setting with the current cache options to run
significantly faster even using 'memory' cache.  There was clearly not too
many repeated requests in the sample.

In memory cache.
382 seconds

The biggest amount of queries and processing comes from authz.  So I decided
to remove them to see what would happen.

removing authz only
252 seconds

Genshi is not that fast.  I removed any rendering to see what that meant. Se
we return nothing but still do all the background processing.

no rendering and no template level authz
190 seconds
no rendering and no authz
161 seconds

This 161 seconds is mainly sqlalchemy with 63 seconds of it actually
database execution time.

Out of 161 seconds ...
41 seconds are due to search.  The take around 0.1 seconds per search being
(378 searches in the sample)
35 seconds for package and tag reads.  These two take about 0.15 seconds
each.
22 seconds for revision list which is the atom feed.  These take about 5
seconds each as they where only 4.
16 seconds for package_edit. They take around 0.16 seconds each.
10 seconds on the home page taking up 0.34 seconds each.
9 seconds of querying for users in the base controller.
.. there 30ish other seconds are many other little things that are too small
to account for.

These times will be increased by a fair chunk if you reintroduce authz.

Suggestions...

There are some very slow queries (and too many repeated ones) around, which
would be good to speed up, however the problem currently is with our fairly
dynamic pages and our very flexible authz system.
It is very difficult to do decent cache invalidation due to this.  I have
been trying to think of a decent way, but anything I come up has ended up
overly complicated and not great to maintain. I also think we should aim for
a system that does not manually have to be tuned for certain deployments and
just works and works fast.

I have come up with possible ideas that could solve this.

1. Do not put any dynamic content in our template for content that varies
from user to user.   We could add all this content with ajax later (this
would mainly consist of authorization stuff).  This would mean pages would
be much more cachable and we could even prefill our cache or even have
static files that we generate.

2. Have a special cache just for 'visitors'.  We could catch them low down
the middleware stack and serve what we have cached.  The same pre-filling
applies to the above.

This is just for info, I will make it into a crep if we decide to go down
one of these paths.  Feedback and any other views I would be very interested
in.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20110614/0b115f77/attachment-0001.html>


More information about the ckan-dev mailing list