[ckan-dev] Elastic search datastore

Matthew Fullerton matt.fullerton at gmail.com
Tue Nov 10 13:30:16 UTC 2015


Just wanted to point out the nice project from Rufus for playing with the
datastore/sql queries/joins:
http://dev.rufuspollock.org/ckan-explorer/?endpoint=http://data.nhm.ac.uk


On 28 October 2015 at 14:37, Ross Jones <ross at servercode.co.uk> wrote:

> Hi,
>
> > On 28 Oct 2015, at 12:19, Ben Scott <ben at benscott.co.uk> wrote:
> >
> > Hi Ross -
> >
> > Breaking into separate tables would definitely help - but won’t that
> also be difficult to use with the API? I always thought the logic of the
> CKAN datastore was one table per dataset, otherwise we’d have to try and
> alter the API queries. Do you know of other sites denormalising their
> datastore, because if that works it could definitely be the way to go.
>
> Datastore's actually one table per resource, personally I think the
> reliance on being associated with resources is unnecessary, multiple tables
> per dataset (with out resources) makes more sense to me .. but we are where
> we are.
>
> It should be possible for you to use datastore_create to create several
> datastore tables (and associated resources), and then query across them
> with joins.  You should give the datastore tables aliases though so that
> you're not querying resource uuids.  You may want to hide the resources, or
> not if you're splitting the file into several.
>
> I would skip the datapusher entirely for this use case though, definitely
> worth doing your own ETL to get data in.
>
>
> > Yep, it’s far too big to get into memory - at least with the resources
> we have at the Museum.  One of our data files -
> data.nhm.ac.uk/resources/gbif_dwca.zip, and here’s the UI we’ve built to
> explore it
> http://data.nhm.ac.uk/dataset/56e711e6-c847-4f99-915a-6894bb5c5dea/resource/05ff2255-c38a-40c9-b657-4ccb55ab2feb?view_id=6b611d29-1dcf-4c60-b6b5-4cbb69fdf4fe
>
> Will definitely have a nosey look at the datafile when I've got a stronger
> wifi connection ;)
>
> Trying out your UI, it looks like the faceting is very useful, so perhaps
> your approach might be the best rather than re-inventing the faceting on
> top of datastore.
>
> Having the datastore data indexed in a separate solr core might be an
> interesting thing generally for the datastore. Having facets on
> mostly-structured data sounds neat. Is this something you could possibly
> break out into a plugin for the datastore?
>
> Cheers
>
> Ross
>
>
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20151110/a64f2928/attachment-0002.html>


More information about the ckan-dev mailing list