[ckan-dev] Issue with new Solr Schema + faceting with extras

Adrià Mercader amercadero at gmail.com
Thu Dec 8 11:36:07 UTC 2011


Hi all,
Another step in the long road to Solr enlightenment :)

Recently John introduced some changes in the schema to fix #1455 (show
extras in search results with all_fields=true)
https://github.com/okfn/ckan/commit/05b675a4314ad269c6e6a095d57e3f2a21e771eb#diff-0

That worked great, but introduced some changes on the way facets for
extras were created, which produced some weird results:
(extras_publishertype should be "primary_source" and extras_filetype
should be "activity" / "organisation")

<lst name="facet_counts">
 <lst name="facet_queries"/>
 <lst name="facet_fields">
  <lst name="extras_publishertype">
	<int name="primari">1180</int>
	<int name="primarysourc">1180</int>
	<int name="sourc">1180</int>
  </lst>
  <lst name="extras_filetype">
	<int name="activ">1173</int>
	<int name="organis">7</int>
  </lst>
 </lst>
 <lst name="facet_dates"/>
</lst>

After spending a while playing with the properties of the "extras_*"
field, the conclusion is that dynamicField + type="text" does not play
well with faceting. Two options here:

1 - Change the type of the extras_* dynamicField to "string", which
fixes the issue. But the "string" type is much more limited in terms
of searching (case sensitive, no synonyms... that's why it's used on
titles, notes, etc.) so the results dependent on "complex" strings in
extras could lose quality

2 - Apart from the "extras_*" field, when indexing, we add all extras
at the main namespace of the package (so in the Solr index there are
e.g a "publishertype" and "filetype" fields).  In our schema there is
catch-all dynamicField with type "string" that recognizes these
fields:

<dynamicField name="*" type="string" indexed="true"  stored="false"/>

So if you facet by these fields, you get the expected results:
<lst name="facet_counts">
 <lst name="facet_queries"/>
 <lst name="facet_fields">
  <lst name="publishertype">
	<int name="primary_source">1180</int>
  </lst>
  <lst name="filetype">
	<int name="activity">1173</int>
	<int name="organisation">7</int>
  </lst>
 </lst>
 <lst name="facet_dates"/>
</lst>

This option is a good one because we don't change the schema and
extras_* still has type="text", but there is a slight chance that if
some project/ extension is faceting extras they may need to facet by
"extraname" instead of "extras_extraname". The only one I know of is
IATI, which I'm happy to change, but there may be another one out
there.

I would vote option 2 if nobody objects / has an alternative

Sorry for the long email,

Adrià




More information about the ckan-dev mailing list