[ckan-dev] Datastore Solr search extension
Alice Heaton
a.heaton at nhm.ac.uk
Fri Mar 27 14:48:03 UTC 2015
Hello,
Motivated by low PostgreSQL performance on very large datasets, we have
implemented an extension to use Solr to perform datastore searches.
The extension provides a separate API entry point which aims to be
compatible with the datastore_search API entry point, and uses Solr to
perform searches. The extension also allows to override the
datastore_search entry point and use the solr version, either for all
datasets or per dataset.
There are some trade-offs of course, in particular this does not provide
real time indexing - not even near-real time. It is aimed at cases where
the data either does not change, or changes at pre-determined intervals
(eg. weekly), and in a way that some lag in the indexing is acceptable.
Within those constraints however, it is *much* faster than using PostgreSql.
You will need a good understanding of Solr to use this. As the current
version does not attempt to use schema-less or dynamic fields, you will
need to write a Solr schema for the target dataset, and a data importer
to index it. Note that the plugin can be extended, and it should be
possible to extend it to provide schema-less or dynamic fields support.
The plugin provides an interface, IDataSolr, which is analogous to the
IDatastore interface and allows other plugins to modify the solr queries.
We ourselves do not use this in production yet, so I wouldn't recommend
for anyone else to do so - however if anyone wants to test it, or help
in the development, that would be welcome.
And at last here comes the link:
https://github.com/NaturalHistoryMuseum/ckanext-datasolr
Best Wishes,
Alice Heaton
More information about the ckan-dev
mailing list