[ckan-dev] Bootup performance

Steven De Costa steven.decosta at linkdigital.com.au
Tue Apr 7 11:29:32 UTC 2015


Hi Alex,

We run around 6k datasets on data.gov.au but some of those have resources
with more than a million records so it is a bit bigger than it sounds. 4k
on data.vic.gov.au.

Although this is not the exact infrastructure we use, it is similar:
http://www.linkdigital.com.au/opendata_architecture.php

We've moved to RDS on AWS for the PaaS platform we've built for
datashades.com.

M3.large instances are what we use for web app roles. Right now we use
xlarge for DB. Solr is a medium (I think, but maybe also a large...)

Startup takes a few minutes at most but we have caching in place to keep
things up. Plus multiple web app servers means they can glitch and be
replaced within the autoscale group without downtime.

I'm not entirely sure what time it takes for a cold restart as the bigger
of those two environments has generally been up for a few years. But, I
suspect we can recheck on a UAT environment and let you know.

Cheers,
Steven


On Tuesday, April 7, 2015, Alex Corbi <a.corbi at gmail.com> wrote:

> Hi Steve,
>
> Thanks for you answer. Our current docker setup segregates indeed the
> different instances in different containers ( 1x postgresql, 1x ckan, 1x
> solr).
>
> In order to compare… could you please tell me a bit about your CKAN
> instance:
> - How many datasets are currently hosted?
> - What are the specs of the machines where CKAN runs?
> - How long does it take aprox. for the CKAN instance to boot after a reset
> or shutdown ?
>
> --
> Alex Corbi
>
> Am 7. April 2015 bei 12:14:14, Alex Corbi (a.corbi at gmail.com
> <javascript:_e(%7B%7D,'cvml','a.corbi at gmail.com');>) schrieb:
>
>  Hi Everyone,
>
> I have a performance question depending on amount of stored datasets and
> bootup times.
>
> In the context of http://data.opendevelopmentmekong.net/, which is an
> instance based on CKAN v2.2.1 deployed through uwsgi on a server with 2core
> and 4GB of memory (using docker containers), currently with VERY low
> traffic and 13 datasets hosted. In this scneario, bootup times after
> restart of the docker container for CKAN are quick and does not present any
> issue.
>
> However, the bootup time and derivated Issues increase considerably with
> the number of datasets. On the very same setup, but being populated with
> ~2000 datasets, the CKAN instance takes up to 20 minutes to boot and
> sometimes shows an erratic behaviour after rebooting ( Internal Server
> Errors, random URLs and resources not being loaded).
>
> So, here my questions:
> - Do the characteristics of the described system (number of datasets,
> traffic) comply with the "small to medium" instance type mentioned on
> https://github.com/ckan/ckan/wiki/Hardware-Requirements? Are 2 core/4GB
> mem ok?
>
> - During the bootup process, activity on the CKAN and Postgresql side can
> be detected. both components take a big percentage of the CPU during the
> bootup (~20 minutes). What is supposed to be happening behind the scenes?
> Solr reindexing everytime CKAN restarts?
>
> - Is there any possible action to be done in order to reduce the booting
> time of a restarted CKAN instance? (DB/Solr/ckan conf.)
>
> Thanks in advance, any help is appreciated,
>
>  --
> Alex Corbi
>  ------------------------------
>
>

-- 
*STEVEN DE COSTA *|
*EXECUTIVE DIRECTOR*www.linkdigital.com.au
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150407/1bf79907/attachment-0003.html>


More information about the ckan-dev mailing list