You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Abhay Kumar <ab...@AnjuSoftware.com> on 2020/10/28 16:44:46 UTC

How to remove special characters from suggestion in Solr

Hello,

We are using below suggest component in our solr implementation.

<searchComponent name="suggest" class="solr.SuggestComponent">
                              <lst name="suggester">
                                             <str name="name">analyzinginfixsuggester</str>
                                             <str name="lookupimpl">analyzinginfixlookupfactory</str>
                                             <str name="dictionaryimpl">documentdictionaryfactory</str>
                                             <str name="field">text_auto</str>
                                             <str name="suggestanalyzerfieldtype">prefix_text</str>
                                             <str name="buildonstartup">true</str>
                                             <str name="buildoncommit">true</str>
                              </lst>
                              <lst name="suggester">
                                             <str name="name">FreeTextSuggester</str>
                                             <str name="lookupImpl">FreeTextLookupFactory</str>
                                             <str name="dictionaryImpl">DocumentDictionaryFactory</str>
                                             <str name="field">text</str>
                                             <str name="ngrams">5</str>
                                             <str name="separator"> </str>
                                             <str name="suggestFreeTextAnalyzerFieldType">text_general</str>
                                             <str name="buildOnStartup">true</str>
                                             <str name="buildOnCommit">true</str>
                              </lst>
               </searchComponent>




<field name="text_auto" type="prefix_text" multiValued="false" indexed="true" stored="true"/>
<fieldType name="prefix_text" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
      <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" "/>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

For one of document, we have large data and while syncing this document using SolrNet library. We are getting below exception.

SuggestComponent Exception in building suggester index for: AnalyzingInfixSuggester
java.lang.IllegalArgumentException: Document contains at least one immense term in field="exacttext" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[77, 101, 100, 105, 99, 97, 108, 32, 108, 97, 117, 110, 99, 104, 32, 112, 97, 99, 107, 10, 65, 98, 105, 114, 97, 116, 101, 114, 111, 110]...', original message: bytes can be at most 32766 in length; got 95994

Please help to resolve this issue.

Any help to remove special characters from suggestion result will also work.

Thanks.
Abhay


Confidentiality Notice
====================
This email message, including any attachments, is for the sole use of the intended recipient and may contain confidential and privileged information. Any unauthorized view, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Anju Software, Inc. 4500 S. Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.