[ckan-dev] CKAN Multi-site

Ross Jones ross at servercode.co.uk
Fri Dec 12 09:11:49 UTC 2014


Hi Florian,

> On 11 Dec 2014, at 03:30, Florian May <florian.wendelin.mayer at gmail.com> wrote:
> not sure whether this is helpful, but we're working on a multi-site CKAN (master branch because #1779) on AWS (Ubuntu 14.04
> LTS / t2.medium / 100GB hd) for an Australian state government agency. We're running three CKANs (three separate .ini configs running as apache wsgi apps, three plus three ckan / datastore databases, three separate filestore locations) sharing one SolR core, and having three supervisor'd celery/redis task queues. The AWS VM is orchestrated using saltstack and proxied both on AWS's and our side using nginx.

This sounds great.  Running like this seems the best approach for what you’re doing, at least it should be reasonably straight-forward if you needed to migrate an instance somewhere.  I guess a single solr core is working well for you, but presumably there’s no reason they couldn’t all just have their own?

One thing that I’ve been thinking about recently is how to handle harvesting or qa in multi-tenant systems.  These both have background tasks for long-running functionality, but I’m not sure that each backend task is provided information about which CKAN instance it is for.  I have a feeling they depend on the ckan.ini - how do you handle this? One set of celery tasks per instance?

Perhaps it would be better if one instance of the harvester could support multiple instances of CKAN?  That’s also broaden the ecosystem a bit and would allow people to provide services to CKAN users who don’t want to setup and manage their own harvesters.  No idea how/if this could work, but I can think of a few instances that want harvesting but don’t want to have to set it up and manage it.

> Our three production instances are one internal-facing for our sensitive and unpublished/cooling-off data sets, one external-facing (to be officially sanctioned later), and one internal-facing sandbox for users to learn and developers to test their scripts without regrets. We felt it more practicable to use the same install for all three instances, rather than installing three separate CKAN instances.

That sounds like a really good idea, much simpler to upgrade all of the instances at once.

> If we can separate out sensitive config settings and data we could provide the clean AWS snapshot if that's any help. Realistically that should include the 2.3 release.

I think Link Digital already provide an AMI for CKAN, but if it is different I guess it couldn’t hurt to have more choice.  One thing that isn’t really covered in the US ODI Issues is that of platform, a couple are mentioned but it isn’t clear to me whether the goal is to go far enough that CKAN is usable in a multi-tenant scenario with any provider, or whether the idea is to just pick one provider and use that.  From work on datapress.io I know that running multi-tenant CKAN right now isn’t impossible, it’s a bit fiddly in places, but it is definitely feasible.  I think the goal of the ODI work is to make that much more straight-forward and much easier to deploy on different platforms.  

> If our deployment architecture makes any sense to you all, in order to upscale to multiple agencies as per US Open Data Institute's suggestion we'd simply spin up one instance of our  AWS image per agency. The only effort would be to separate out sensitive settings for db passwords, maybe a default CKAN sysadmin account, nginx reverse proxy settings and salt server orchestration settings. Getting all sensitive settings from one global config would be great, but it would require CKAN to read settings from environment variables (cf. site_url and docker images).

The tickets at  https://github.com/opendata/CKAN-Multisite/issues/8 <https://github.com/opendata/CKAN-Multisite/issues/8> and https://github.com/ckan/ideas-and-roadmap/issues/88 <https://github.com/ckan/ideas-and-roadmap/issues/88> cover some of the configuration issues you mention.  We made a start in allowing ckan.ini to be specified in $CKAN_INI - which is only really helpful in single instance environments on the command line, but allows you to specify the location of your .ini at the apache level with SetEnv rather than in the wsgi file - but this is only a small step in a bigger potential move.

> Let us know what you think, any feedback would be greatly appreciated!

Sounds great, thanks for the detail - it’s always good to hear how people are using/installing CKAN.  Might you consider pasting it at http://github.com/opendata/CKAN-Multisite/issues <http://github.com/opendata/CKAN-Multisite/issues> as a new issue “How we do multi-tenant” perhaps?

Ross.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20141212/30d3ad71/attachment-0003.html>


More information about the ckan-dev mailing list