[ckan-dev] Elastic search datastore

Ross Jones ross at servercode.co.uk
Mon Oct 26 18:26:04 UTC 2015


Hi Ben,

> On 26 Oct 2015, at 11:29, Ben Scott <ben at benscott.co.uk> wrote:
> 
> Hi -
> 
> I was wondering if anyone could shed a bit of light regarding the decision to move the datastore from elasticsearch to postgres?  Were there reasons why elasticsearch was considered unsuitable?

Wow, that's a blast from the past.  You're talking about http://trac.ckan.org/ticket/1797.html right?  This might have been the second (and not final) version of the datastore/webstore (there was one that had separate sqlite DBs as well).  

It was primarily the work of one person over a 2 week sprint - I'm not sure it ever saw production use, and I don't think there was general acceptance that using ES as a database was a very wise idea - certainly back  then, it may be better suited for it now. Also useful things like being able to query with SQL were quite appealing. 

> 
> We're using CKAN at the NHM for one pretty big dataset (2.8m+ records) - with other, larger datasets coming online next year.  We needed people to explore these datasets, filter on any field etc.,. and postgres really struggled, probably in large part due to us shoehorning our messy data into a one-table structure. So we installed SOLR to provide the search index for that dataset which works well.  
> 
> We're now trying to scale up, and were consisdering that the postgres datastore would be redundant if we switched to using SOLR / elasticsearch to index all our datasets.  But aflter looking into it and realising previous versions of CKAN had already used elasticsearch for the datastore, we don't want to make the same mistakes if that approach has been tried and failed.

Sorry I couldn't be more helpful, but I just don't think it went far enough to give you any useful data on how effective it was - perhaps someone else might remember specifics?  I'm still nervous about using Solr as a database of any sort, but I guess this is mostly going to be reading rather than writing, so perhaps it'll work - but you can get a very long way by tuning the default postgres config (which isn't great), and turning off fsync and related options. I don't think  well-indexed 3M row table is beyond its abilities.

Cheers

Ross




More information about the ckan-dev mailing list