[ckan-dev] key value store, caching and redis

Sat Jan 29 23:12:16 UTC 2011

On 28 January 2011 23:42, David Raznick <kindly at gmail.com> wrote:
> Hello
>
> There is a need for ckan plugins to have a place to store things.
> http://ckan.org/ticket/934

Useful to talk about some simple use cases:

Watch/follow a package extension: http://ckan.org/ticket/936
Download stats for resources: http://ckan.org/ticket/937
Config options in WUI extension: http://ckan.org/ticket/277
Apps extension (not yet ticketed -- allow users to register app ideas ...)

To summarize my views:

* 80/20 solution is very attractive at this early stage (there will
always be a version 2 if this turns out to be useful!)
* for simple cases the key/value setup probably buys us most of the
mileage we need. I would vote for sql option since uses existing tech
but could be persuaded.
* for complex cases option (1) (own tables) could be used though worth
seeing what one could do with (2) (sql k/v)

> There are 3 ideas on how to achieve this:
>
>  1.  Let plugins make their own sql tables.
>
> Pros.
>      Greatest flexibility.   Give power over completely to the plugin.
> Cons
>      Migration issues.  Different instances will have different schemas.
> Will definitely need lot of manual work every db upgrade.
>      There is nothing to stop a plugin from doing this already in its own
> database/key value store of choosing.  It will have to handle its own db
> upgrades.

Since connection of plugin into core schema should be pretty limited
(e.g. just to point to package ids) do not think differing schemas
will be an issue. For a complex extension I think this is quite an
attractive option.

>  2.  Make a key value table.  This table will have essentially 3 columns.
> Namespace, key, value.  The value being a serialised json object.  The
> namespace will denote what plugin owns that particular row.

Probably, as Friedrich says: namespace, obj_id, key, value

> Pros
>      Flexible enough for most needs.
>      Simple to make.
> Cons
>      Serializing json in dbs is not great practice.
>      Data would be messy to handle

But the point here is we would not be doing any complex querying on
value field so not sure data messiness is a problem.

> 3.  Use redis as key value store.  Keys can have their own namespace above.
> This can be optional as a config setting (but obviously needed if a plugin
> required it)
>
> Pros
>      Simple.
>      Data store suited to task.
>      Everything it did would be fast.
>      Atomic operations easy i.e counters, queues
>      Plugins could do many more things, without the need to manage own
> database. i.e there could be a caching plugin, a pubsub plugin, a plugin
> that stored the last 10 packages a user viewed.
> Cons
>     All stored in memory
>     Another daemon process to run.  (even though Ubuntu has an upto date
> version in its repositories)
>     If used for caching and persistent data at the same time we will have to
> deal with durability/speed compromises. see
> http://redis.io/topics/persistence
>
> I personally would go with redis.  I think we need a rethink how we do
> caching at the moment and this could be the way to do it.

Real question is: does cost of introducing a new component (from an
install complexity, and dev complexity) worth the benefits that redis
brings over doing key value in database.

While true that if we use redis for caching we are already requiring
it and so not another sysadmin requirement still have dev complexity
(extension authors need to know about redis) and it seems caching
versus 'real db' requirements on redis are somewhat different (as you
mention).

It would be worth you elaborating on the attractions of option (3) versus (2).

Rufus