[ckan-dev] Performance CKAN / Configuration Parameters
Alice Heaton
a.heaton at nhm.ac.uk
Mon Aug 11 10:08:19 UTC 2014
Hello,
You may already be aware of these things, but just to throw in some ideas:
- A single process will only run on a single CPU so by setting
'processes=2' it means your CKAN application will only run on 2 CPUs.
This might be what you want (to reserve the other CPUs for
Postgres/Jetty etc.), but good to keep in mind;
- I don't think ServerLimits affects wsgi daemon mode - though I may be
wrong about this;
- With processes=2 and threads=30, you will serve at most 2*30
concurrent requests;
- It's important to remember that all requests, including those for
static files, go through mod_wsgi.
If browsers are firing, say, 8 concurrent requests then that leaves
you with 2*30/8 concurrent clients
(this is very approximate, it will depend on the time taken for each
request, client caching, etc. however
it's a good way to get an idea of what is happening). The best way to
deal with this is to add a caching
server in front (say nginx or varnish) to ensure static files are cached;
- You don't mention PostgreSQL settings, and whether you use the
datastore (and if so with how many rows).
On our setup (with the datastore and tables with over 3,000,000
rows), PostgreSQL is the slow point.
The default PostgreSQL settings are very conservative. The first
thing to do there is to increase shared_buffers -
the recommended value is about 25% of available memory. The next one
to set in effective_cache_size, this
should be roughly shared_buffers plus the amount of system caches.
What will make a real difference for a large database is to set
work_mem. This has to be tuned carefully, as you are
setting the memory available for each operation in a query - so a
query with 12 joins will use up to 12*work_mem.
If you set this to low, then your sorts/joins will happen on disc -
which can be very slow. If you set this to high, you might
run out of memory!
The best way to work this out is to enable slow query logging, and
look for the slow queries. explain analyze will tell you
how much memory they need, and whether the operations happen on disc
or in memory. Increase work_mem to make them
happen in memory (if possible).
- In postgres, you should also check the number of allowed connections.
Depending on your settings/plugins, CKAN may make more
than one connection per request. With 2*30 workers, if each worker
makes more than one connection then you will run out.
I'm interested in hearing about anything else you find that affects
performance, so please let us know !
Best Wishes,
Alice Heaton
On 06/08/14 18:54, hotz wrote:
> Hi all,
>
> we do performance tests with following setup:
> - CKAN 2.1.2
> - Search queries via the default ckan-portal and a web-portal
> - Ramp tests of 50,100,200...600...1000,...5000 users per second (!)
> (we expect such numbers in the early on-line phase of our portal)
> - 24 GB RAM, 8 CPUs
>
> Following parameter configurations in:
> 1) apache2.conf
> ServerLimits 300
> MaxClients 300
> for all occurrences
>
> 2) Jetty Java_Options:
> -Xms512M -Xmx4g
>
> 3) virtual host ckan_default:
> WSGIDaemonProcess ckan_default display-name=ckan_default processes=2
> threads=30
>
> We get response time of ca. mean 30 seconds each of 600 concurrent
> users per second.
> And several errors. Which altogether we feel bad with.
>
> The CPUs are 50% active, RAM uses only ca. 5GB (of the 24 GB).
> The ckan-portal and web-portal have same results.
>
>
> Is there somebody who can explain the above parameters and optimal
> settings of them and their influences?
> E.g. do apache workers correspond to threads? Are there multiple jetty
> processes or only one if CKAN is running?
> Has anybody experiences in this direction or hints to further
> information?
>
> Best wishes,
> Lothar
>
>
>
More information about the ckan-dev
mailing list