[ckan-dev] prevent some fields from being used in multilingual extension
Jean Pommier
jean.pommier at pi-geosolutions.fr
Fri May 17 09:56:21 UTC 2019
Hi,
I'm adding some fields to my dataset, fields that are likely to contain
a very large piece of information. I understand that if I want solr to
index it, I need to define these fields as text instead of string. Or
not to index them at all (one of them if a geometry, there is no point
in indexing it). So far, so good
But I have the feeling that there is a problem with multilingual
extension: if I activate the multilingual_dataset extension, during the
dataset indexation update, I get the following error:
500 Internal Server Error
The server has either erred or is incapable of performing the
requested operation.
Échec de mise à jour de l'index de recherche.('Solr returned an
error: (u\'Solr responded with an error (HTTP 400): [Reason:
Exception writing document id 6069a7526af8874d374160fb41b34a2e to
the index; possible analysis error: Document contains at least one
immense term in field="text_fr" (whose UTF8 encoding is longer than
the max length 32766), all of which were skipped. Please correct the
analyzer to not produce such terms. The prefix of the first immense
term is: \\\'[105, 109, 97, 103, 101, 116, 116, 101, 32, 55, 99, 54,
49, 48, 98, 51, 102, 45, 101, 53, 100, 98, 45, 52, 57, 51, 54, 45,
57, 99]...\\\', original message: bytes can be at most 32766 in
length; got 251794. Perhaps the document has an indexed string field
(solr.StrField) which is too large]\',)',)
This is weird since:
* my very large fields are declared as text in the solr schema
* they are processed with no error without the multilingual extension
* in multilingual, text_fr is declared as solr.TextField
so it should be fine, right ?
I've just noticed that if I add <dynamicField name="text_*" type="text"
indexed="true" stored="true" multiValued="false"/> in my schema, things
go back in order. So is it just a solr config issue ?
Anyway, is there a way to tell multilingual not to add those fields into
text_fr field for indexation ?
Apart from adding the fields names in KEYS_TO_IGNORE in mutlingual's
plugin.py, i don't see a way.
Best,
Jean
--
*Jean Pommier -- pi-Geosolutions*
Ingénieur, consultant indépendant
Tél. : (+33) 6 09 23 21 36
E-mail : jp at pi-geosolutions.fr
Web : www.pi-geosolutions.fr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20190517/53e0935e/attachment.html>
More information about the ckan-dev
mailing list