[ckan-dev] Performance CKAN / Configuration Parameters

Mon Aug 11 10:08:19 UTC 2014

Hello,

You may already be aware of these things, but just to throw in some ideas:

- A single process will only run on a single CPU so by setting 
'processes=2' it means your CKAN application will only run on 2 CPUs.
   This might be what you want (to reserve the other CPUs for 
Postgres/Jetty etc.), but good to keep in mind;

- I don't think ServerLimits affects wsgi daemon mode - though I may be 
wrong about this;

- With processes=2 and threads=30, you will serve at most 2*30 
concurrent requests;

- It's important to remember that all requests, including those for 
static files, go through mod_wsgi.
   If browsers are firing, say, 8 concurrent requests then that leaves 
you with 2*30/8 concurrent clients
   (this is very approximate, it will depend on the time taken for each 
request, client caching, etc. however
   it's a good way to get an idea of what is happening). The best way to 
deal with this is to add a caching
   server in front (say nginx or varnish) to ensure static files are cached;

- You don't mention PostgreSQL settings, and whether you use the 
datastore (and if so with how many rows).
    On our setup (with the datastore and tables with over 3,000,000 
rows), PostgreSQL is the slow point.

    The default PostgreSQL settings are very conservative. The first 
thing to do there is to increase shared_buffers -
    the recommended value is about 25% of available memory. The next one 
to set in effective_cache_size, this
    should be roughly shared_buffers plus the amount of system caches.

    What will make a real difference for a large database is to set 
work_mem. This has to be tuned carefully, as you are
    setting the memory available for each operation in a query - so a 
query with 12 joins will use up to 12*work_mem.
    If you set this to low, then your sorts/joins will happen on disc - 
which can be very slow. If you set this to high, you might
    run out of memory!

    The best way to work this out is to enable slow query logging, and 
look for the slow queries. explain analyze will tell you
    how much memory they need, and whether the operations happen on disc 
or in memory. Increase work_mem to make them
    happen in memory (if possible).

- In postgres, you should also check the number of allowed connections. 
Depending on your settings/plugins, CKAN may make more
   than one connection per request. With 2*30 workers, if each worker 
makes more than one connection then you will run out.

I'm interested in hearing about anything else you find that affects 
performance, so please let us know !

Best Wishes,
Alice Heaton

On 06/08/14 18:54, hotz wrote:
> Hi all,
>
> we do performance tests with following setup:
> - CKAN 2.1.2
> - Search queries via the default ckan-portal and a web-portal
> - Ramp tests of 50,100,200...600...1000,...5000 users per second (!)
>   (we expect such numbers in the early on-line phase of our portal)
> - 24 GB RAM, 8 CPUs
>
> Following parameter configurations in:
> 1) apache2.conf
>  ServerLimits 300
>  MaxClients 300
>  for all occurrences
>
> 2) Jetty Java_Options:
> -Xms512M -Xmx4g
>
> 3) virtual host ckan_default:
> WSGIDaemonProcess ckan_default display-name=ckan_default processes=2 
> threads=30
>
> We get response time of ca. mean 30 seconds each of 600 concurrent 
> users per second.
> And several errors. Which altogether we feel bad with.
>
> The CPUs are 50% active, RAM uses only ca. 5GB (of the 24 GB).
> The ckan-portal and web-portal have same results.
>
>
> Is there somebody who can explain the above parameters and optimal 
> settings of them and their influences?
> E.g. do apache workers correspond to threads? Are there multiple jetty 
> processes or only one if CKAN is running?
> Has anybody experiences in this direction or hints to further 
> information?
>
> Best wishes,
> Lothar
>
>
>