[ckan-dev] Performance CKAN / Configuration Parameters

hotz hotz at informatik.uni-hamburg.de
Fri Aug 15 16:47:37 UTC 2014


Hi Alice,

thank you very much for your reply!
I'm still checking out things and will later answer in more detail.

For now:
- we enhance the maximal connections of Postgres from default 100 to 120 
connections
on a 2GB development machine and to 1000 connections with 24 GB shared 
memory
on a 32 GB production machine. This enabled (of course) more concurrent 
requests.
(If the connections are getting too many there is a log entry in CKAN 
and in the DB-log,
which helped tuning.)

- we moved from apache/prefork to apache/worker according to 
[http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html]. 
However, I'm still not sure if this is needed because
mod_wsgi runs in daemon mode in our case.

- with this configuration, we currently get on a 8 core machine, 24GB 
(the Postgres is on a different machine,
apache server with mod_wsgi, no datastore, no varnish) and the following 
scenario the data shown below [1]:

Scenario: Start with 128 concurrent users, each clicks on three pages 
(start, query, detail of one dataset), waits 5 seconds,
starts again. This, 5 minutes long. "Users" at two locations (Hamburg 
and Bremen).
--> Page loading is between 35s to 75s in average. :-(
--> but no failure. :-)

With 16 concurrent users, we got page loads way below 10s.
With more than 128 users we got failures, which still have to be analyzed.

Next steps: with varnish, tune DB, Apache and WSGI according to your 
suggestions.

Perhaps use of uwsgi instead of mod_wsgi (if CKAN allows it..).

That's for now, best wishes and thanx again!
Lothar

[1]

===================================
5 minutes run
128 users starting(two locations, HH and HB).
Three pages: start, query, show detail.
Two page (start and query) reported below.
"Test" meaning one run through the scenario.

                HH     HB
Users          64     64
Req/sec        37,9   31,3
#Tests        119     83
Page Start    38,9s   47,6s   (average)
Page Query    57,1s   75,8s   (average)
Failure        0       0

===================================


Apache configuration:

KeepAlive OFF

<IfModule mpm_worker_module>
     StartServers          50
     MinSpareThreads      25
     MaxSpareThreads      2500
     ThreadLimit          2500
     ThreadsPerChild      50
     ServerLimit         1300
     MaxClients          1300
     MaxRequestsPerChild  0
</IfModule>

WSGI:
processes=2 threads=120

===> This might be the reason that we get failures if 256 users
have to be served.

1300 MaxClients/ 50 ThreadsPerChild = 26 processes

ps -dealf | grep apache | grep www_data | wc
--> 27 Apache processes (1 wait)

ps -dealf | grep ckan_default | grep www_data | wc
--> 2 ckan_default processes

===> This has to be aligned. It looks like that 26 apache processes
send requests to 2 ckan_default processes. Main question: what are
the relations behind apache processes



Am 11.08.2014 12:08, schrieb Alice Heaton:
> Hello,
>
> You may already be aware of these things, but just to throw in some 
> ideas:
>
> - A single process will only run on a single CPU so by setting 
> 'processes=2' it means your CKAN application will only run on 2 CPUs.
>   This might be what you want (to reserve the other CPUs for 
> Postgres/Jetty etc.), but good to keep in mind;
>
> - I don't think ServerLimits affects wsgi daemon mode - though I may 
> be wrong about this;
>
> - With processes=2 and threads=30, you will serve at most 2*30 
> concurrent requests;
>
> - It's important to remember that all requests, including those for 
> static files, go through mod_wsgi.
>   If browsers are firing, say, 8 concurrent requests then that leaves 
> you with 2*30/8 concurrent clients
>   (this is very approximate, it will depend on the time taken for each 
> request, client caching, etc. however
>   it's a good way to get an idea of what is happening). The best way 
> to deal with this is to add a caching
>   server in front (say nginx or varnish) to ensure static files are 
> cached;
>
> - You don't mention PostgreSQL settings, and whether you use the 
> datastore (and if so with how many rows).
>    On our setup (with the datastore and tables with over 3,000,000 
> rows), PostgreSQL is the slow point.
>
>    The default PostgreSQL settings are very conservative. The first 
> thing to do there is to increase shared_buffers -
>    the recommended value is about 25% of available memory. The next 
> one to set in effective_cache_size, this
>    should be roughly shared_buffers plus the amount of system caches.
>
>    What will make a real difference for a large database is to set 
> work_mem. This has to be tuned carefully, as you are
>    setting the memory available for each operation in a query - so a 
> query with 12 joins will use up to 12*work_mem.
>    If you set this to low, then your sorts/joins will happen on disc - 
> which can be very slow. If you set this to high, you might
>    run out of memory!
>
>    The best way to work this out is to enable slow query logging, and 
> look for the slow queries. explain analyze will tell you
>    how much memory they need, and whether the operations happen on 
> disc or in memory. Increase work_mem to make them
>    happen in memory (if possible).
>
> - In postgres, you should also check the number of allowed 
> connections. Depending on your settings/plugins, CKAN may make more
>   than one connection per request. With 2*30 workers, if each worker 
> makes more than one connection then you will run out.
>
> I'm interested in hearing about anything else you find that affects 
> performance, so please let us know !
>
> Best Wishes,
> Alice Heaton
>
> On 06/08/14 18:54, hotz wrote:
>> Hi all,
>>
>> we do performance tests with following setup:
>> - CKAN 2.1.2
>> - Search queries via the default ckan-portal and a web-portal
>> - Ramp tests of 50,100,200...600...1000,...5000 users per second (!)
>>   (we expect such numbers in the early on-line phase of our portal)
>> - 24 GB RAM, 8 CPUs
>>
>> Following parameter configurations in:
>> 1) apache2.conf
>>  ServerLimits 300
>>  MaxClients 300
>>  for all occurrences
>>
>> 2) Jetty Java_Options:
>> -Xms512M -Xmx4g
>>
>> 3) virtual host ckan_default:
>> WSGIDaemonProcess ckan_default display-name=ckan_default processes=2 
>> threads=30
>>
>> We get response time of ca. mean 30 seconds each of 600 concurrent 
>> users per second.
>> And several errors. Which altogether we feel bad with.
>>
>> The CPUs are 50% active, RAM uses only ca. 5GB (of the 24 GB).
>> The ckan-portal and web-portal have same results.
>>
>>
>> Is there somebody who can explain the above parameters and optimal 
>> settings of them and their influences?
>> E.g. do apache workers correspond to threads? Are there multiple 
>> jetty processes or only one if CKAN is running?
>> Has anybody experiences in this direction or hints to further 
>> information?
>>
>> Best wishes,
>> Lothar
>>
>>
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev



More information about the ckan-dev mailing list