[ckan-dev] Elastic search datastore

Ben Scott ben at benscott.co.uk
Wed Oct 28 12:19:37 UTC 2015


Hi Ross -

Breaking into separate tables would definitely help - but won’t that also be difficult to use with the API? I always thought the logic of the CKAN datastore was one table per dataset, otherwise we’d have to try and alter the API queries. Do you know of other sites denormalising their datastore, because if that works it could definitely be the way to go. 

Yep, it’s far too big to get into memory - at least with the resources we have at the Museum.  One of our data files - data.nhm.ac.uk/resources/gbif_dwca.zip <http://data.nhm.ac.uk/resources/gbif_dwca.zip>, and here’s the UI we’ve built to explore it http://data.nhm.ac.uk/dataset/56e711e6-c847-4f99-915a-6894bb5c5dea/resource/05ff2255-c38a-40c9-b657-4ccb55ab2feb?view_id=6b611d29-1dcf-4c60-b6b5-4cbb69fdf4fe <http://data.nhm.ac.uk/dataset/56e711e6-c847-4f99-915a-6894bb5c5dea/resource/05ff2255-c38a-40c9-b657-4ccb55ab2feb?view_id=6b611d29-1dcf-4c60-b6b5-4cbb69fdf4fe>

Thanks again for your help,
Ben

> On 27 Oct 2015, at 14:11, Ross Jones <ross at servercode.co.uk> wrote:
> 
> Hi,
> 
>> On 27 Oct 2015, at 11:23, Ben Scott <ben at benscott.co.uk> wrote:
>> Thanks very much for your reply! Yep, that’s it, and I also found this old datastore client repo which uses ES https://github.com/okfn/datastore-client.  That does explain why it was removed - I completely agree using ES/Solr as a database isn’t good .  We don’t use the datastore's write/delete capabilities on our portal, it’s read-only with data held in another file or database. So essentially we use the datastore as an index of other resources, but that’s probably not such common use case.  
>> It was the “well-indexed” bit which we struggled with :) As we wanted to allow per-field filtering, the UX was slow unless we added an index to every field - and when we did that on a table with 150 columns x 3m rows the DB wasn’t happy!  Using Solr for filtering was much more performant - but yep, replacing the datastore altogether isn’t the way to go, we’ll need to think a bit more about our approach here.
> 
> I think it depends on how you want people to allow people access it. Perhaps you could break the 150 columns into several tables, and make it easy to join?  Can you enforce LIMITs (a quick EXPLAIN on the query will tell you if there is already a limit or not)? Is it *really* too big to just all go in RAM?  
> 
> Do you have an example file that you're looking to use that I could have a peek at, just out of general interest/nosiness?  
> 
> Cheers
> 
> Ross
> 
> 
> 
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20151028/7b5d4553/attachment-0003.html>


More information about the ckan-dev mailing list