[ckan-discuss] CKAN and SOLR

Rufus Pollock rufus.pollock at okfn.org
Tue Feb 23 13:01:40 GMT 2010


On 23 February 2010 11:30, David Read <david.read at okfn.org> wrote:
> We're looking to add Apache SOLR structured search as an option with
> CKAN, providing as an alternative to the Google-style search we
> already have. I've sketched out the plan and some of the issues here:
>
> http://knowledgeforge.net/ckan/trac/wiki/SolrInterface

Great stuff Dave. I'd suggest renaming this after the problem rather
than a potential solution:

"Rich search support (e.g. full-text) for CKAN (and client codebases)"

Thinking about the requirements I can think of:

  1. Full-text (for some fields)
  2. Partial matching (for some fields)
  3. Faceted search
  4. Scalability and speed (how much data, how fast etc)
  5. Incremental updates etc
  6. APIs (e.g. JSON), language bindings (e.g. python), pluggability
(e.g. drupal module)
  7. More i haven't thought of!

Our current approach is to do this using postgres' built-in full-text
search support. This does FTS and we take care of 5 and 6 (though not
the drupal end of things) and for our problem 4 isn't a huge issue
(our DB is relatively quite small so far!).

Moving things out of the DB has several big attractions. AFAICT the
three best-known options at the moment are:

  * Lucene/Solr
  * Sphinx
  * Xapian

I've only got personal experience of the last (plus some indirect
experience of the first ) but my general impression from the 'net is
that the first two seem to be the most hard-core, while Xapian has
good reputation for ease of use and deployability but is not quite as
feature rich or scalable.

Thus in my view the real choice would be between:
  * DB (keeping things how they are but could add e.g. partial matching easily)
  * Solr
  * Xapian

If faceted search is a must then I think the only real option is Solr.

Regards,

Rufus



More information about the ckan-discuss mailing list