[ckan-dev] German solr reference configuration

Stefan Oderbolz odi at metaodi.ch
Thu Feb 21 13:38:36 UTC 2019


Hi Ian

yes that's a good idea, I'll start writing something up (after my holidays
;) ).

- Stefan

On Thu, Feb 21, 2019 at 2:35 PM Ian Ward <ian at excess.org> wrote:

> Stefan this is really useful.
>
> What do you think about adding a few paragraphs to the official ckan
> docs that reference your work?
>
> On Thu, Feb 21, 2019 at 2:59 AM Stefan Oderbolz <odi at metaodi.ch> wrote:
> >
> > Hi Robert
> >
> > We had similar problems for opendata.swiss (which is served in 4
> languages), what we found very useful for German was the usage of a
> DictionaryCompoundWordTokenFilterFactory.
> > You can find our solr configuration on GitHub, here is the part for the
> German text:
> https://github.com/opendata-swiss/ckanext-switzerland/blob/87a9f07995f6c8a5a4230cf1760cf4ce3520ab3a/solr/schema.xml#L158-L202
> >
> > Search is a very complex topic, so I'm sure we will continue to tweak it.
> >
> > Btw: since we launched the autosuggest feature, to suggest search terms
> to a user, we get a lot of positive feedback. It helps to find "good"
> search terms, that actually lead to a result.
> >
> > And about the "search term is in the title, but not displayed on the
> first page": it's worth adapting the boost of different fields in the
> search, we do this in the "before_search" hook (in our case to boost
> language-specific fields):
> >
> https://github.com/opendata-swiss/ckanext-switzerland/blob/87a9f07995f6c8a5a4230cf1760cf4ce3520ab3a/ckanext/switzerland/plugin.py#L629-L638
> >
> > But you can simply use that to give e.g. 8x weight to the title field,
> compared to the description (see
> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Theqf_QueryFields_Parameter
> for details)
> >
> >
> > Hope this helps.
> >
> > - Stefan
> >
> >
> > On Wed, Feb 20, 2019 at 4:21 PM Harm, Robert <robert.harm at brz.gv.at>
> wrote:
> >>
> >> Hello,
> >>
> >>
> >>
> >> we are getting reports that (due to our solr/lucene configuration)
> search results on data.gv.at do often not have a high quality, meaning
> that if e.g. the search term is in the title of indexed dataset, it is not
> displayed on the first page of the results sets.
> >>
> >>
> >>
> >> I wonder if there is a German solr reference configuration which is
> recommended to be used with CKAN? The shipped solr config (e.g.
> https://github.com/ckan/ckan/blob/b9e45e2723d4abd70fa72b16ec4a0bebc795c56b/contrib/docker/solr/solrconfig.xml)
> seems tob e optimized for English language.
> >>
> >>
> >>
> >> Any help would be much appreciated!
> >>
> >> Best,
> >>
> >>
> >>
> >> Robert
> >>
> >> _______________________________________________
> >> ckan-dev mailing list
> >> ckan-dev at lists.okfn.org
> >> https://lists.okfn.org/mailman/listinfo/ckan-dev
> >> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> >
> > _______________________________________________
> > ckan-dev mailing list
> > ckan-dev at lists.okfn.org
> > https://lists.okfn.org/mailman/listinfo/ckan-dev
> > Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
> _______________________________________________
> ckan-dev mailing list
> ckan-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/ckan-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/ckan-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20190221/2c428dee/attachment-0002.html>


More information about the ckan-dev mailing list