[ckan-dev] Elastic search datastore

Ross Jones ross at servercode.co.uk
Wed Oct 28 13:37:06 UTC 2015


Hi,

> On 28 Oct 2015, at 12:19, Ben Scott <ben at benscott.co.uk> wrote:
> 
> Hi Ross -
> 
> Breaking into separate tables would definitely help - but won’t that also be difficult to use with the API? I always thought the logic of the CKAN datastore was one table per dataset, otherwise we’d have to try and alter the API queries. Do you know of other sites denormalising their datastore, because if that works it could definitely be the way to go. 

Datastore's actually one table per resource, personally I think the reliance on being associated with resources is unnecessary, multiple tables per dataset (with out resources) makes more sense to me .. but we are where we are.

It should be possible for you to use datastore_create to create several datastore tables (and associated resources), and then query across them with joins.  You should give the datastore tables aliases though so that you're not querying resource uuids.  You may want to hide the resources, or not if you're splitting the file into several.  

I would skip the datapusher entirely for this use case though, definitely worth doing your own ETL to get data in.


> Yep, it’s far too big to get into memory - at least with the resources we have at the Museum.  One of our data files - data.nhm.ac.uk/resources/gbif_dwca.zip, and here’s the UI we’ve built to explore it http://data.nhm.ac.uk/dataset/56e711e6-c847-4f99-915a-6894bb5c5dea/resource/05ff2255-c38a-40c9-b657-4ccb55ab2feb?view_id=6b611d29-1dcf-4c60-b6b5-4cbb69fdf4fe

Will definitely have a nosey look at the datafile when I've got a stronger wifi connection ;) 

Trying out your UI, it looks like the faceting is very useful, so perhaps your approach might be the best rather than re-inventing the faceting on top of datastore.

Having the datastore data indexed in a separate solr core might be an interesting thing generally for the datastore. Having facets on mostly-structured data sounds neat. Is this something you could possibly break out into a plugin for the datastore?

Cheers

Ross






More information about the ckan-dev mailing list