[ckan-dev] German solr reference configuration

Stefan Oderbolz odi at metaodi.ch
Thu Feb 21 07:59:21 UTC 2019


Hi Robert

We had similar problems for opendata.swiss (which is served in 4
languages), what we found very useful for German was the usage of a
DictionaryCompoundWordTokenFilterFactory.
You can find our solr configuration on GitHub, here is the part for the
German text:
https://github.com/opendata-swiss/ckanext-switzerland/blob/87a9f07995f6c8a5a4230cf1760cf4ce3520ab3a/solr/schema.xml#L158-L202

Search is a very complex topic, so I'm sure we will continue to tweak it.

Btw: since we launched the autosuggest feature, to suggest search terms to
a user, we get a lot of positive feedback. It helps to find "good" search
terms, that actually lead to a result.

And about the "search term is in the title, but not displayed on the first
page": it's worth adapting the boost of different fields in the search, we
do this in the "before_search" hook (in our case to boost language-specific
fields):
https://github.com/opendata-swiss/ckanext-switzerland/blob/87a9f07995f6c8a5a4230cf1760cf4ce3520ab3a/ckanext/switzerland/plugin.py#L629-L638

But you can simply use that to give e.g. 8x weight to the title field,
compared to the description (see
https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Theqf_QueryFields_Parameter
for details)


Hope this helps.

- Stefan


On Wed, Feb 20, 2019 at 4:21 PM Harm, Robert <robert.harm at brz.gv.at> wrote:

> Hello,
>
>
>
> we are getting reports that (due to our solr/lucene configuration) search
> results on data.gv.at do often not have a high quality, meaning that if
> e.g. the search term is in the title of indexed dataset, it is not
> displayed on the first page of the results sets.
>
>
>
> I wonder if there is a German solr reference configuration which is
> recommended to be used with CKAN? The shipped solr config (e.g.
> https://github.com/ckan/ckan/blob/b9e45e2723d4abd70fa72b16ec4a0bebc795c56b/contrib/docker/solr/solrconfig.xml)
> seems tob e optimized for English language.
>
>
>
> Any help would be much appreciated!
>
> Best,
>
>
>
> Robert
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20190221/43b39d1f/attachment-0002.html>


More information about the ckan-dev mailing list