[ckan-dev] CKAN performance

David Raznick david.raznick at okfn.org
Tue Feb 26 00:04:03 UTC 2013


Hello

There is a page cache redis option, but it works in only situations where
we can be certain that there is no variability on the page.

The only requests that can easily be cached this way are the package show
pages when people are not logged in.  i.e on package change invalidate the
cache.  Pages like the search results or tags or publisher information
could be invalidated at the same time but it is difficult to catch them
all.  As such the, package read pages did not seem work treating as a
special case.

So my preferred approach is to cache all pages for say 20 min for all non
logged in users. This number can be higher if there is any worry about
load. This way no individual page, wherever it is, can be hit too hard and
what get cached will be based on actual usage rather than trying to guess
the pages in advance.  This will catch cases like advertised search terms
linked to a particular facet.  It will be trivial enough if this cache is
too stale to refresh it, but users not logged in will not miss say 20 mins
worth of updates.  Diversity in usage is most likely to come from different
search terms and this be hard to guess.

David



On Mon, Feb 25, 2013 at 10:08 PM, David Read <david.read at hackneyworkshop.com
> wrote:

> These are good ideas, thanks David - we'll give them a bash tomorrow.
>
> Btw did you try caching pages (in Redis) and invalidating when data
> changes? We're considering doing this with our Varnish.
>
> Dave
> On Feb 25, 2013 8:26 PM, "David Raznick" <david.raznick at okfn.org> wrote:
>
>> Hello David
>>
>> Am not free tomorrow have family things on.
>>
>> However, I can give you a few things you could look at.  CCed CKAN dev
>> for interest sake.
>>
>>  **Uploading performance*
>>
>> A big factor in this is that we do solr commit every single dataset.  For
>> some instancies we have stopped this and just do a commit every 30 seconds
>> or so. In solr 4 there is a config option for this so you do need an
>> external cron to do it (and a soft commit option which can be done about
>> every second)
>>
>> We where doing a lot of dataset (500k) for geodatagov and it got very
>> slow after a while the following helped a lot.
>>
>> Stopping some db contraints:
>> https://github.com/okfn/ckanext-geodatagov/blob/master/constraints.sql
>>
>> and changing the following indexes:
>> https://github.com/okfn/ckanext-geodatagov/blob/master/what_to_alter.sql
>>
>> These a are not in CKAN master yet but we will be adding some soon.
>>
>> **Read Performance*
>>
>> As we store the whole package_dict in the search index it is best to use
>> that where you can  make things a lot faster.  In 2.0 for tag_show and
>> group_show, instead of package_dictizing every associated dataset we just
>> did a search query instead.
>>
>> Genshi is much slower than Jinja. The old auth model queries are slower
>> too.
>>
>> When searching even if you getting stuff out of the search index we are
>> still checking the existence of each dataset.  This could be sped up by
>> doing a bulk check of all packages returned or just trusting the search
>> index to be in sync.
>>
>> That is all I can think of for now.
>>
>> Good luck
>>
>> David
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20130226/e0f0df81/attachment-0001.html>


More information about the ckan-dev mailing list