[ckan-dev] Performance CKAN / Configuration Parameters
hotz
hotz at informatik.uni-hamburg.de
Fri Aug 15 16:47:37 UTC 2014
Hi Alice,
thank you very much for your reply!
I'm still checking out things and will later answer in more detail.
For now:
- we enhance the maximal connections of Postgres from default 100 to 120
connections
on a 2GB development machine and to 1000 connections with 24 GB shared
memory
on a 32 GB production machine. This enabled (of course) more concurrent
requests.
(If the connections are getting too many there is a log entry in CKAN
and in the DB-log,
which helped tuning.)
- we moved from apache/prefork to apache/worker according to
[http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html].
However, I'm still not sure if this is needed because
mod_wsgi runs in daemon mode in our case.
- with this configuration, we currently get on a 8 core machine, 24GB
(the Postgres is on a different machine,
apache server with mod_wsgi, no datastore, no varnish) and the following
scenario the data shown below [1]:
Scenario: Start with 128 concurrent users, each clicks on three pages
(start, query, detail of one dataset), waits 5 seconds,
starts again. This, 5 minutes long. "Users" at two locations (Hamburg
and Bremen).
--> Page loading is between 35s to 75s in average. :-(
--> but no failure. :-)
With 16 concurrent users, we got page loads way below 10s.
With more than 128 users we got failures, which still have to be analyzed.
Next steps: with varnish, tune DB, Apache and WSGI according to your
suggestions.
Perhaps use of uwsgi instead of mod_wsgi (if CKAN allows it..).
That's for now, best wishes and thanx again!
Lothar
[1]
===================================
5 minutes run
128 users starting(two locations, HH and HB).
Three pages: start, query, show detail.
Two page (start and query) reported below.
"Test" meaning one run through the scenario.
HH HB
Users 64 64
Req/sec 37,9 31,3
#Tests 119 83
Page Start 38,9s 47,6s (average)
Page Query 57,1s 75,8s (average)
Failure 0 0
===================================
Apache configuration:
KeepAlive OFF
<IfModule mpm_worker_module>
StartServers 50
MinSpareThreads 25
MaxSpareThreads 2500
ThreadLimit 2500
ThreadsPerChild 50
ServerLimit 1300
MaxClients 1300
MaxRequestsPerChild 0
</IfModule>
WSGI:
processes=2 threads=120
===> This might be the reason that we get failures if 256 users
have to be served.
1300 MaxClients/ 50 ThreadsPerChild = 26 processes
ps -dealf | grep apache | grep www_data | wc
--> 27 Apache processes (1 wait)
ps -dealf | grep ckan_default | grep www_data | wc
--> 2 ckan_default processes
===> This has to be aligned. It looks like that 26 apache processes
send requests to 2 ckan_default processes. Main question: what are
the relations behind apache processes
Am 11.08.2014 12:08, schrieb Alice Heaton:
> Hello,
>
> You may already be aware of these things, but just to throw in some
> ideas:
>
> - A single process will only run on a single CPU so by setting
> 'processes=2' it means your CKAN application will only run on 2 CPUs.
> This might be what you want (to reserve the other CPUs for
> Postgres/Jetty etc.), but good to keep in mind;
>
> - I don't think ServerLimits affects wsgi daemon mode - though I may
> be wrong about this;
>
> - With processes=2 and threads=30, you will serve at most 2*30
> concurrent requests;
>
> - It's important to remember that all requests, including those for
> static files, go through mod_wsgi.
> If browsers are firing, say, 8 concurrent requests then that leaves
> you with 2*30/8 concurrent clients
> (this is very approximate, it will depend on the time taken for each
> request, client caching, etc. however
> it's a good way to get an idea of what is happening). The best way
> to deal with this is to add a caching
> server in front (say nginx or varnish) to ensure static files are
> cached;
>
> - You don't mention PostgreSQL settings, and whether you use the
> datastore (and if so with how many rows).
> On our setup (with the datastore and tables with over 3,000,000
> rows), PostgreSQL is the slow point.
>
> The default PostgreSQL settings are very conservative. The first
> thing to do there is to increase shared_buffers -
> the recommended value is about 25% of available memory. The next
> one to set in effective_cache_size, this
> should be roughly shared_buffers plus the amount of system caches.
>
> What will make a real difference for a large database is to set
> work_mem. This has to be tuned carefully, as you are
> setting the memory available for each operation in a query - so a
> query with 12 joins will use up to 12*work_mem.
> If you set this to low, then your sorts/joins will happen on disc -
> which can be very slow. If you set this to high, you might
> run out of memory!
>
> The best way to work this out is to enable slow query logging, and
> look for the slow queries. explain analyze will tell you
> how much memory they need, and whether the operations happen on
> disc or in memory. Increase work_mem to make them
> happen in memory (if possible).
>
> - In postgres, you should also check the number of allowed
> connections. Depending on your settings/plugins, CKAN may make more
> than one connection per request. With 2*30 workers, if each worker
> makes more than one connection then you will run out.
>
> I'm interested in hearing about anything else you find that affects
> performance, so please let us know !
>
> Best Wishes,
> Alice Heaton
>
> On 06/08/14 18:54, hotz wrote:
>> Hi all,
>>
>> we do performance tests with following setup:
>> - CKAN 2.1.2
>> - Search queries via the default ckan-portal and a web-portal
>> - Ramp tests of 50,100,200...600...1000,...5000 users per second (!)
>> (we expect such numbers in the early on-line phase of our portal)
>> - 24 GB RAM, 8 CPUs
>>
>> Following parameter configurations in:
>> 1) apache2.conf
>> ServerLimits 300
>> MaxClients 300
>> for all occurrences
>>
>> 2) Jetty Java_Options:
>> -Xms512M -Xmx4g
>>
>> 3) virtual host ckan_default:
>> WSGIDaemonProcess ckan_default display-name=ckan_default processes=2
>> threads=30
>>
>> We get response time of ca. mean 30 seconds each of 600 concurrent
>> users per second.
>> And several errors. Which altogether we feel bad with.
>>
>> The CPUs are 50% active, RAM uses only ca. 5GB (of the 24 GB).
>> The ckan-portal and web-portal have same results.
>>
>>
>> Is there somebody who can explain the above parameters and optimal
>> settings of them and their influences?
>> E.g. do apache workers correspond to threads? Are there multiple
>> jetty processes or only one if CKAN is running?
>> Has anybody experiences in this direction or hints to further
>> information?
>>
>> Best wishes,
>> Lothar
>>
>>
>>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
More information about the ckan-dev
mailing list