[ckan-dev] prevent some fields from being used in multilingual extension

Jean Pommier jean.pommier at pi-geosolutions.fr
Fri May 17 09:56:21 UTC 2019


Hi,

I'm adding some fields to my dataset, fields that are likely to contain 
a very large piece of information. I understand that if I want solr to 
index it, I need to define these fields as text instead of string. Or 
not to index them at all (one of them if a geometry, there is no point 
in indexing it). So far, so good

But I have the feeling that there is a problem with multilingual 
extension: if I activate the multilingual_dataset extension, during the 
dataset indexation update, I get the following error:

    500 Internal Server Error
    The server has either erred or is incapable of performing the
    requested operation.

    Échec de mise à jour de l'index de recherche.('Solr returned an
    error: (u\'Solr responded with an error (HTTP 400): [Reason:
    Exception writing document id 6069a7526af8874d374160fb41b34a2e to
    the index; possible analysis error: Document contains at least one
    immense term in field="text_fr" (whose UTF8 encoding is longer than
    the max length 32766), all of which were skipped. Please correct the
    analyzer to not produce such terms. The prefix of the first immense
    term is: \\\'[105, 109, 97, 103, 101, 116, 116, 101, 32, 55, 99, 54,
    49, 48, 98, 51, 102, 45, 101, 53, 100, 98, 45, 52, 57, 51, 54, 45,
    57, 99]...\\\', original message: bytes can be at most 32766 in
    length; got 251794. Perhaps the document has an indexed string field
    (solr.StrField) which is too large]\',)',)

This is weird since:

  * my very large fields are declared as text in the solr schema
  * they are processed with no error without the multilingual extension
  * in multilingual, text_fr is declared as solr.TextField

so it should be fine, right ?

I've just noticed that if I add <dynamicField name="text_*" type="text" 
indexed="true" stored="true" multiValued="false"/> in my schema, things 
go back in order. So is it just a solr config issue ?


Anyway, is there a way to tell multilingual not to add those fields into 
text_fr field for indexation ?

Apart from adding the fields names in KEYS_TO_IGNORE in mutlingual's 
plugin.py, i don't see a way.

Best,

Jean

-- 

*Jean Pommier -- pi-Geosolutions*

Ingénieur, consultant indépendant

Tél. : (+33) 6 09 23 21 36
E-mail : jp at pi-geosolutions.fr
Web : www.pi-geosolutions.fr

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/ckan-dev/attachments/20190517/53e0935e/attachment.html>


More information about the ckan-dev mailing list