[ckan-dev] German solr reference configuration

Ian Ward ian at excess.org
Thu Feb 21 13:34:45 UTC 2019


Stefan this is really useful.

What do you think about adding a few paragraphs to the official ckan
docs that reference your work?

On Thu, Feb 21, 2019 at 2:59 AM Stefan Oderbolz <odi at metaodi.ch> wrote:
>
> Hi Robert
>
> We had similar problems for opendata.swiss (which is served in 4 languages), what we found very useful for German was the usage of a DictionaryCompoundWordTokenFilterFactory.
> You can find our solr configuration on GitHub, here is the part for the German text: https://github.com/opendata-swiss/ckanext-switzerland/blob/87a9f07995f6c8a5a4230cf1760cf4ce3520ab3a/solr/schema.xml#L158-L202
>
> Search is a very complex topic, so I'm sure we will continue to tweak it.
>
> Btw: since we launched the autosuggest feature, to suggest search terms to a user, we get a lot of positive feedback. It helps to find "good" search terms, that actually lead to a result.
>
> And about the "search term is in the title, but not displayed on the first page": it's worth adapting the boost of different fields in the search, we do this in the "before_search" hook (in our case to boost language-specific fields):
> https://github.com/opendata-swiss/ckanext-switzerland/blob/87a9f07995f6c8a5a4230cf1760cf4ce3520ab3a/ckanext/switzerland/plugin.py#L629-L638
>
> But you can simply use that to give e.g. 8x weight to the title field, compared to the description (see https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Theqf_QueryFields_Parameter for details)
>
>
> Hope this helps.
>
> - Stefan
>
>
> On Wed, Feb 20, 2019 at 4:21 PM Harm, Robert <robert.harm at brz.gv.at> wrote:
>>
>> Hello,
>>
>>
>>
>> we are getting reports that (due to our solr/lucene configuration) search results on data.gv.at do often not have a high quality, meaning that if e.g. the search term is in the title of indexed dataset, it is not displayed on the first page of the results sets.
>>
>>
>>
>> I wonder if there is a German solr reference configuration which is recommended to be used with CKAN? The shipped solr config (e.g. https://github.com/ckan/ckan/blob/b9e45e2723d4abd70fa72b16ec4a0bebc795c56b/contrib/docker/solr/solrconfig.xml) seems tob e optimized for English language.
>>
>>
>>
>> Any help would be much appreciated!
>>
>> Best,
>>
>>
>>
>> Robert
>>
>> _______________________________________________
>> ckan-dev mailing list
>> ckan-dev at lists.okfn.org
>> https://lists.okfn.org/mailman/listinfo/ckan-dev
>> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev



More information about the ckan-dev mailing list