[ckan-dev] Bootup performance

Denis Zgonjanin deniszgonjanin at gmail.com
Wed Apr 8 14:52:26 UTC 2015


Hi Alex,

I don't see anything too funny in your fork of CKAN, and it runs fine for
me, though I didn't quite load a thousand datasets into it.

I suspect it may be an orchestration problem. Can you describe a bit about
how you are using Docker to run and deploy this? Specifically, Dockerfiles
and any associated scripts that run on startup would help.

- Denis



On Tue, Apr 7, 2015 at 1:02 PM, Alex Corbi <a.corbi at gmail.com> wrote:

> Hi Ian,
>
> I would like to add to my previous email that we are running a customized
> clone of CKAN, with ONLY following changes:
> https://github.com/OpenDevelopmentMekong/ckan/commits/master
>
> Could you please review them and tell me if you see something strange?
> Maybe this here has something to do with the Issue:
> https://github.com/OpenDevelopmentMekong/ckan/commit/1791fe2dc7a65cdd819862ebc2ef7a9309469ebe
>
> Anyway, we are going to deploy tomorrow our code based on a vanilla CKAN
> 2.3 and see if the Issue is still there.
>
> Today, we have tried disabling auto_commit on Solr (as specified on
> https://github.com/ckan/ckan/wiki/Performance-tips-for-large-imports#solr)
> and our own theme (
> http://github.com/OpenDevelopmentMekong/ckanext-odm_theme) with same
> results = Extreme high CPU and I/O load between CKAN and Postgresql
> containers on startup.
>
> Thanks in advance for the support, as always!
>
> --
> Alex Corbi
>
> Am 7. April 2015 bei 14:13:08, Alex Corbi (a.corbi at gmail.com) schrieb:
>
>  Hi Ian,
>
> > Web start up time should not depend on the number of datasets, and should be measured in second> s not minutes.
>
> OK, this is key information. Because the Issue we are having is definitelly depending on the number of datasets stored.
>
> > Are you running anything else during start up?
>
> Sometimes we have seen a need for restarting the SOLr container as well (running docker stop solr; docker stop ckan; docker start solr; docker start ckan; ). AS mentioned, we have the components of the architecture ( porstgresql, solr, ckan) separated in different Docker containers. What do you feel in general about using docker for deploying CKAN?
>
> > Have you tried disabling plugins in your ini file? Have you made any changes to > ckan?
>
> Here is the list of plugins that we currently have enabled on the production.ini file:
>
>
> ckan.plugins = stats text_preview recline_preview pdf_preview datastore datapusher resource_proxy multilingual_dataset multilingual_group multilingual_tag odm_theme pages googleanalytics geojson_preview wms_preview
>
> Being odm_theme, our own developed theme for UI customization and adding some logic, you can browse the code here: http://github.com/OpenDevelopmentMekong/ckanext-odm_theme Do you see something weird on the implementation?
>
>
>
>  Ian
>
>
>  --
> Alex Corbi
>
> Am 7. April 2015 bei 12:46:21, Alex Corbi (a.corbi at gmail.com) schrieb:
>
>   Hi Steve,
>
>  Thanks for you answer. Our current docker setup segregates indeed the
> different instances in different containers ( 1x postgresql, 1x ckan, 1x
> solr).
>
>  In order to compare… could you please tell me a bit about your CKAN
> instance:
>  - How many datasets are currently hosted?
>  - What are the specs of the machines where CKAN runs?
>  - How long does it take aprox. for the CKAN instance to boot after a
> reset or shutdown ?
>
>  --
> Alex Corbi
>
> Am 7. April 2015 bei 12:14:14, Alex Corbi (a.corbi at gmail.com) schrieb:
>
>   Hi Everyone,
>
> I have a performance question depending on amount of stored datasets and
> bootup times.
>
> In the context of http://data.opendevelopmentmekong.net/, which is an
> instance based on CKAN v2.2.1 deployed through uwsgi on a server with 2core
> and 4GB of memory (using docker containers), currently with VERY low
> traffic and 13 datasets hosted. In this scneario, bootup times after
> restart of the docker container for CKAN are quick and does not present any
> issue.
>
> However, the bootup time and derivated Issues increase considerably with
> the number of datasets. On the very same setup, but being populated with
> ~2000 datasets, the CKAN instance takes up to 20 minutes to boot and
> sometimes shows an erratic behaviour after rebooting ( Internal Server
> Errors, random URLs and resources not being loaded).
>
> So, here my questions:
> - Do the characteristics of the described system (number of datasets,
> traffic) comply with the "small to medium" instance type mentioned on
> https://github.com/ckan/ckan/wiki/Hardware-Requirements? Are 2 core/4GB
> mem ok?
>
> - During the bootup process, activity on the CKAN and Postgresql side can
> be detected. both components take a big percentage of the CPU during the
> bootup (~20 minutes). What is supposed to be happening behind the scenes?
> Solr reindexing everytime CKAN restarts?
>
> - Is there any possible action to be done in order to reduce the booting
> time of a restarted CKAN instance? (DB/Solr/ckan conf.)
>
> Thanks in advance, any help is appreciated,
>
>  --
> Alex Corbi
>  ------------------------------
>
> ------------------------------
>
> ------------------------------
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20150408/638b7628/attachment-0003.html>


More information about the ckan-dev mailing list