[ckan-dev] Elastic search datastore

Ben Scott ben at benscott.co.uk
Tue Oct 27 11:23:11 UTC 2015


Hi Ross -

Thanks very much for your reply! Yep, that’s it, and I also found this old datastore client repo which uses ES https://github.com/okfn/datastore-client <https://github.com/okfn/datastore-client>.  That does explain why it was removed - I completely agree using ES/Solr as a database isn’t good .  We don’t use the datastore's write/delete capabilities on our portal, it’s read-only with data held in another file or database. So essentially we use the datastore as an index of other resources, but that’s probably not such common use case.  

It was the “well-indexed” bit which we struggled with :) As we wanted to allow per-field filtering, the UX was slow unless we added an index to every field - and when we did that on a table with 150 columns x 3m rows the DB wasn’t happy!  Using Solr for filtering was much more performant - but yep, replacing the datastore altogether isn’t the way to go, we’ll need to think a bit more about our approach here.

Many thanks again for your help,
Ben

> On 26 Oct 2015, at 18:26, Ross Jones <ross at servercode.co.uk> wrote:
> 
> Hi Ben,
> 
>> On 26 Oct 2015, at 11:29, Ben Scott <ben at benscott.co.uk> wrote:
>> 
>> Hi -
>> 
>> I was wondering if anyone could shed a bit of light regarding the decision to move the datastore from elasticsearch to postgres?  Were there reasons why elasticsearch was considered unsuitable?
> 
> Wow, that's a blast from the past.  You're talking about http://trac.ckan.org/ticket/1797.html right?  This might have been the second (and not final) version of the datastore/webstore (there was one that had separate sqlite DBs as well).  
> 
> It was primarily the work of one person over a 2 week sprint - I'm not sure it ever saw production use, and I don't think there was general acceptance that using ES as a database was a very wise idea - certainly back  then, it may be better suited for it now. Also useful things like being able to query with SQL were quite appealing. 
> 
>> 
>> We're using CKAN at the NHM for one pretty big dataset (2.8m+ records) - with other, larger datasets coming online next year.  We needed people to explore these datasets, filter on any field etc.,. and postgres really struggled, probably in large part due to us shoehorning our messy data into a one-table structure. So we installed SOLR to provide the search index for that dataset which works well.  
>> 
>> We're now trying to scale up, and were consisdering that the postgres datastore would be redundant if we switched to using SOLR / elasticsearch to index all our datasets.  But aflter looking into it and realising previous versions of CKAN had already used elasticsearch for the datastore, we don't want to make the same mistakes if that approach has been tried and failed.
> 
> Sorry I couldn't be more helpful, but I just don't think it went far enough to give you any useful data on how effective it was - perhaps someone else might remember specifics?  I'm still nervous about using Solr as a database of any sort, but I guess this is mostly going to be reading rather than writing, so perhaps it'll work - but you can get a very long way by tuning the default postgres config (which isn't great), and turning off fsync and related options. I don't think  well-indexed 3M row table is beyond its abilities.
> 
> Cheers
> 
> Ross
> 
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20151027/178785dc/attachment-0003.html>


More information about the ckan-dev mailing list