You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Rajesh Somvanshi <ra...@gmail.com> on 2020/08/12 06:56:31 UTC

Error - Document contains at least one immense term

Hi *Team,*


I am using solr library to be indexing my documents. It is working as
expected but sometimes I am getting below error. Could you please help with
this?



The document contains at least one immense term in
field="FileContent_en***" (whose UTF8 encoding is longer than the max
length 32766), all of which were skipped. Please correct the analyzer to
not produce such terms. The prefix of the first immense term is: '[110, 97,
109, 101, 61, 34, 97, 99, 113, 117, 105, 115, 105, 116, 105, 111, 110, 115,
116, 111, 114, 101, 34, 62, 101, 106, 122, 107, 118, 118]...', original
message: bytes can be at most 32766 in length; got 422071. Perhaps the
document has an indexed string field (solr.StrField) which is too large
solr.field





Thanks

Rajesh Somvanshi

Re: Error - Document contains at least one immense term

Posted by Erick Erickson <er...@gmail.com>.

"Perhaps the document has an indexed string field (solr.StrField) which is too large solr.field”

string based fields have a limit of 32K. If you mean the field to be searchable, a 32K field _as a single term_ makes little sense.
You probably want to change the fieldType to a text-based field that’s analyzed (i.e. broken up into tokens), the
default schemas have a text_general pre-defined type that you can use to get started.


Best,
Erick

> On Aug 12, 2020, at 7:20 AM, Atri Sharma <at...@apache.org> wrote:
> 
>> 
>> 
>> 
>> Hi Team,
>> 
>> 
>> I am using solr library to be indexing my documents. It is working as expected but sometimes I am getting below error. Could you please help with this?
>> 
>> 
>> 
>> The document contains at least one immense term in field="FileContent_en***" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[110, 97, 109, 101, 61, 34, 97, 99, 113, 117, 105, 115, 105, 116, 105, 111, 110, 115, 116, 111, 114, 101, 34, 62, 101, 106, 122, 107, 118, 118]...', original message: bytes can be at most 32766 in length; got 422071. Perhaps the document has an indexed string field (solr.StrField) which is too large solr.field
>> 
>> 
>> 
>> 
>> 
>> Thanks
>> 
>> Rajesh Somvanshi
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Error - Document contains at least one immense term

Posted by Atri Sharma <at...@apache.org>.

Perhaps you have a field which exceeds the set limit?

On Wed, Aug 12, 2020 at 4:47 PM Rajesh Somvanshi
<ra...@gmail.com> wrote:
>
>
> Hi Team,
>
>
> I am using solr library to be indexing my documents. It is working as expected but sometimes I am getting below error. Could you please help with this?
>
>
>
> The document contains at least one immense term in field="FileContent_en***" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[110, 97, 109, 101, 61, 34, 97, 99, 113, 117, 105, 115, 105, 116, 105, 111, 110, 115, 116, 111, 114, 101, 34, 62, 101, 106, 122, 107, 118, 118]...', original message: bytes can be at most 32766 in length; got 422071. Perhaps the document has an indexed string field (solr.StrField) which is too large solr.field
>
>
>
>
>
> Thanks
>
> Rajesh Somvanshi

-- 
Regards,

Atri
Apache Concerted

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org