[ckan-dev] Datastore Solr search extension

Alice Heaton a.heaton at nhm.ac.uk
Fri Mar 27 14:48:03 UTC 2015


Hello,

Motivated by low PostgreSQL performance on very large datasets, we have 
implemented an extension to use Solr to perform datastore searches.

The extension provides a separate API entry point which aims to be 
compatible with the datastore_search API entry point, and uses Solr to 
perform searches. The extension also allows to override the 
datastore_search entry point and use the solr version, either for all 
datasets or per dataset.

There are some trade-offs of course, in particular this does not provide 
real time indexing - not even near-real time. It is aimed at cases where 
the data either does not change, or changes at pre-determined intervals 
(eg. weekly), and in a way that some lag in the indexing is acceptable. 
Within those constraints however, it is *much* faster than using PostgreSql.

You will need a good understanding of Solr to use this. As the current 
version does not attempt to use schema-less or dynamic fields, you will 
need to write a Solr schema for the target dataset, and a data importer 
to index it. Note that the plugin can be extended, and it should be 
possible to extend it to provide schema-less or dynamic fields support.

The plugin provides an interface, IDataSolr, which is analogous to the 
IDatastore interface and allows other plugins to modify the solr queries.

We ourselves do not use this in production yet, so  I wouldn't recommend 
for anyone else to do so - however if anyone wants to test it, or help 
in the development, that would be welcome.

And at last here comes the link:

https://github.com/NaturalHistoryMuseum/ckanext-datasolr

Best Wishes,
Alice Heaton



More information about the ckan-dev mailing list