You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Szűcs Roland <sz...@bookandwalk.hu> on 2019/07/30 13:50:07 UTC

Problem with solr suggester in case of non-ASCII characters

Hi All,

I have an author suggester (searchcomponent and the related request
handler) defined in solrconfig:
<searchComponent name="suggest" class="solr.SuggestComponent">
    <!-- All suggester component must have different filepath to avoid
    write lock issues-->>
    <lst name="suggester">
      <str name="name">author</str>
      <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">BOOK_productAuthor</str>
      <str name="suggestAnalyzerFieldType">short_text_hu</str>
      <str name="indexPath">suggester_infix_author</str>
      <str name="buildOnStartup">false</str>
      <str name="buildOnCommit">false</str>
      <str name="minPrefixChars">2</str>
    </lst>
</searchComponent>

<requestHandler name="/suggesthandler" class="solr.SearchHandler"
startup="lazy" >
<lst name="defaults">
  <str name="suggest">true</str>
  <str name="suggest.count">10</str>
  <str name="suggest.dictionary">author</str>
</lst>
<arr name="components">
  <str>suggest</str>
</arr>
</requestHandler>

Author field has just a minimal text processing in query and index time
based on the following definition:
<fieldType name="short_text_hu" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ClassicTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>
  <fieldType name="string" class="solr.StrField" sortMissingLast="true"
docValues="true"/>
  <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
docValues="true" multiValued="true"/>
  <fieldType name="text_ar" class="solr.TextField"
positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="lang/stopwords_ar.txt"
ignoreCase="true"/>
      <filter class="solr.ArabicNormalizationFilterFactory"/>
      <filter class="solr.ArabicStemFilterFactory"/>
    </analyzer>
  </fieldType>

When I use qeries with only ASCII characters, the results are correct:
"Al":{
"term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""}

When I try it with Hungarian authorname with special character:
"Jó":"author":{
"Jó":{ "numFound":0, "suggestions":[]}}

When I try it with three letters, it works again:
"Józ":"author":{
"Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", "
weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "
payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, {
"term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, {
"term":"<b>Józ</b>sef
Attila", "weight":0, "payload":""}..

Any idea how can it happen that a longer string has more matches than a
shorter one. It is inconsistent. What can I do to fix it as it would
results poor customer experience.
They would feel that sometimes they need 2 sometimes 3 characters to get
suggestions.

Thanks in advance,
Roland

Re: Problem with solr suggester in case of non-ASCII characters

Posted by Szűcs Roland <sz...@bookandwalk.hu>.

Hi Erick,

Thanks your advice.
I already removed it from the field definition used by the suggester and it
works great. I will consider to took it from the entire processing of the
other fields. I have only 7000 docs with index size of 18MB so far, so  the
memory footprint is not a key issue for me.

Best,
Roland

Erick Erickson <er...@gmail.com> ezt írta (időpont: 2019. júl. 31.,
Sze, 14:24):

> Roland:
>
> Have you considered just not using stopwords anywhere? Largely they’re a
> holdover
> from a long time ago when every byte counted. Plus using stopwords has
> “interesting”
> issues with things like highlighting and phrase queries and the like.
>
> Sure, not using stopwords will make your index larger, but so will a
> copyfield…
>
> Your call of course, but stopwords are over-used IMO.
>
> I’m stealing Walter Underwood’s thunder here ;)
>
> Best,
> Erick
>
> > On Jul 30, 2019, at 2:11 PM, Szűcs Roland <sz...@bookandwalk.hu>
> wrote:
> >
> > Hi Furkan,
> >
> > Thanks the suggestion, I always forget the most effective debugging tool
> > the analysis page.
> >
> > It turned out that "Jó" was a stop word and it was eliminated during the
> > text analysis. What I will do is to create a new field type but without
> > stop word removal and I will use it like this:
> > <str
> > name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal</str>
> >
> > Thanks again
> >
> > Roland
> >
> > Furkan KAMACI <fu...@gmail.com> ezt írta (időpont: 2019. júl.
> 30.,
> > K, 16:17):
> >
> >> Hi Roland,
> >>
> >> Could you check Analysis tab (
> >> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell
> >> how
> >> the term is analyzed for both query and index?
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland <
> szucs.roland@bookandwalk.hu>
> >> wrote:
> >>
> >>> Hi All,
> >>>
> >>> I have an author suggester (searchcomponent and the related request
> >>> handler) defined in solrconfig:
> >>> <searchComponent name="suggest" class="solr.SuggestComponent">
> >>>    <!-- All suggester component must have different filepath to avoid
> >>>    write lock issues-->>
> >>>    <lst name="suggester">
> >>>      <str name="name">author</str>
> >>>      <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
> >>>      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> >>>      <str name="field">BOOK_productAuthor</str>
> >>>      <str name="suggestAnalyzerFieldType">short_text_hu</str>
> >>>      <str name="indexPath">suggester_infix_author</str>
> >>>      <str name="buildOnStartup">false</str>
> >>>      <str name="buildOnCommit">false</str>
> >>>      <str name="minPrefixChars">2</str>
> >>>    </lst>
> >>> </searchComponent>
> >>>
> >>> <requestHandler name="/suggesthandler" class="solr.SearchHandler"
> >>> startup="lazy" >
> >>> <lst name="defaults">
> >>>  <str name="suggest">true</str>
> >>>  <str name="suggest.count">10</str>
> >>>  <str name="suggest.dictionary">author</str>
> >>> </lst>
> >>> <arr name="components">
> >>>  <str>suggest</str>
> >>> </arr>
> >>> </requestHandler>
> >>>
> >>> Author field has just a minimal text processing in query and index time
> >>> based on the following definition:
> >>> <fieldType name="short_text_hu" class="solr.TextField"
> >>> positionIncrementGap="100" multiValued="true">
> >>>    <analyzer type="index">
> >>>      <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >>>      <tokenizer class="solr.ClassicTokenizerFactory"/>
> >>>      <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
> >>> ignoreCase="true"/>
> >>>      <filter class="solr.LowerCaseFilterFactory"/>
> >>>    </analyzer>
> >>>    <analyzer type="query">
> >>>      <tokenizer class="solr.ClassicTokenizerFactory"/>
> >>>      <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
> >>> ignoreCase="true"/>
> >>>      <filter class="solr.LowerCaseFilterFactory"/>
> >>>    </analyzer>
> >>>  </fieldType>
> >>>  <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> >>> docValues="true"/>
> >>>  <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
> >>> docValues="true" multiValued="true"/>
> >>>  <fieldType name="text_ar" class="solr.TextField"
> >>> positionIncrementGap="100">
> >>>    <analyzer>
> >>>      <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>      <filter class="solr.LowerCaseFilterFactory"/>
> >>>      <filter class="solr.StopFilterFactory"
> >> words="lang/stopwords_ar.txt"
> >>> ignoreCase="true"/>
> >>>      <filter class="solr.ArabicNormalizationFilterFactory"/>
> >>>      <filter class="solr.ArabicStemFilterFactory"/>
> >>>    </analyzer>
> >>>  </fieldType>
> >>>
> >>> When I use qeries with only ASCII characters, the results are correct:
> >>> "Al":{
> >>> "term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""}
> >>>
> >>> When I try it with Hungarian authorname with special character:
> >>> "Jó":"author":{
> >>> "Jó":{ "numFound":0, "suggestions":[]}}
> >>>
> >>> When I try it with three letters, it works again:
> >>> "Józ":"author":{
> >>> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", "
> >>> weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0,
> "
> >>> payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0,
> >> "payload":""}, {
> >>> "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, {
> >>> "term":"<b>Józ</b>sef
> >>> Attila", "weight":0, "payload":""}..
> >>>
> >>> Any idea how can it happen that a longer string has more matches than a
> >>> shorter one. It is inconsistent. What can I do to fix it as it would
> >>> results poor customer experience.
> >>> They would feel that sometimes they need 2 sometimes 3 characters to
> get
> >>> suggestions.
> >>>
> >>> Thanks in advance,
> >>> Roland
> >>>
> >>
>
>

Re: Problem with solr suggester in case of non-ASCII characters

Posted by Erick Erickson <er...@gmail.com>.

Roland:

Have you considered just not using stopwords anywhere? Largely they’re a holdover
from a long time ago when every byte counted. Plus using stopwords has “interesting”
issues with things like highlighting and phrase queries and the like.

Sure, not using stopwords will make your index larger, but so will a copyfield…

Your call of course, but stopwords are over-used IMO.

I’m stealing Walter Underwood’s thunder here ;)

Best,
Erick

> On Jul 30, 2019, at 2:11 PM, Szűcs Roland <sz...@bookandwalk.hu> wrote:
> 
> Hi Furkan,
> 
> Thanks the suggestion, I always forget the most effective debugging tool
> the analysis page.
> 
> It turned out that "Jó" was a stop word and it was eliminated during the
> text analysis. What I will do is to create a new field type but without
> stop word removal and I will use it like this:
> <str
> name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal</str>
> 
> Thanks again
> 
> Roland
> 
> Furkan KAMACI <fu...@gmail.com> ezt írta (időpont: 2019. júl. 30.,
> K, 16:17):
> 
>> Hi Roland,
>> 
>> Could you check Analysis tab (
>> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell
>> how
>> the term is analyzed for both query and index?
>> 
>> Kind Regards,
>> Furkan KAMACI
>> 
>> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland <sz...@bookandwalk.hu>
>> wrote:
>> 
>>> Hi All,
>>> 
>>> I have an author suggester (searchcomponent and the related request
>>> handler) defined in solrconfig:
>>> <searchComponent name="suggest" class="solr.SuggestComponent">
>>>    <!-- All suggester component must have different filepath to avoid
>>>    write lock issues-->>
>>>    <lst name="suggester">
>>>      <str name="name">author</str>
>>>      <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>>>      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>>>      <str name="field">BOOK_productAuthor</str>
>>>      <str name="suggestAnalyzerFieldType">short_text_hu</str>
>>>      <str name="indexPath">suggester_infix_author</str>
>>>      <str name="buildOnStartup">false</str>
>>>      <str name="buildOnCommit">false</str>
>>>      <str name="minPrefixChars">2</str>
>>>    </lst>
>>> </searchComponent>
>>> 
>>> <requestHandler name="/suggesthandler" class="solr.SearchHandler"
>>> startup="lazy" >
>>> <lst name="defaults">
>>>  <str name="suggest">true</str>
>>>  <str name="suggest.count">10</str>
>>>  <str name="suggest.dictionary">author</str>
>>> </lst>
>>> <arr name="components">
>>>  <str>suggest</str>
>>> </arr>
>>> </requestHandler>
>>> 
>>> Author field has just a minimal text processing in query and index time
>>> based on the following definition:
>>> <fieldType name="short_text_hu" class="solr.TextField"
>>> positionIncrementGap="100" multiValued="true">
>>>    <analyzer type="index">
>>>      <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>>      <tokenizer class="solr.ClassicTokenizerFactory"/>
>>>      <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
>>> ignoreCase="true"/>
>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>    </analyzer>
>>>    <analyzer type="query">
>>>      <tokenizer class="solr.ClassicTokenizerFactory"/>
>>>      <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
>>> ignoreCase="true"/>
>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>    </analyzer>
>>>  </fieldType>
>>>  <fieldType name="string" class="solr.StrField" sortMissingLast="true"
>>> docValues="true"/>
>>>  <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
>>> docValues="true" multiValued="true"/>
>>>  <fieldType name="text_ar" class="solr.TextField"
>>> positionIncrementGap="100">
>>>    <analyzer>
>>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>>      <filter class="solr.LowerCaseFilterFactory"/>
>>>      <filter class="solr.StopFilterFactory"
>> words="lang/stopwords_ar.txt"
>>> ignoreCase="true"/>
>>>      <filter class="solr.ArabicNormalizationFilterFactory"/>
>>>      <filter class="solr.ArabicStemFilterFactory"/>
>>>    </analyzer>
>>>  </fieldType>
>>> 
>>> When I use qeries with only ASCII characters, the results are correct:
>>> "Al":{
>>> "term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""}
>>> 
>>> When I try it with Hungarian authorname with special character:
>>> "Jó":"author":{
>>> "Jó":{ "numFound":0, "suggestions":[]}}
>>> 
>>> When I try it with three letters, it works again:
>>> "Józ":"author":{
>>> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", "
>>> weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "
>>> payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0,
>> "payload":""}, {
>>> "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, {
>>> "term":"<b>Józ</b>sef
>>> Attila", "weight":0, "payload":""}..
>>> 
>>> Any idea how can it happen that a longer string has more matches than a
>>> shorter one. It is inconsistent. What can I do to fix it as it would
>>> results poor customer experience.
>>> They would feel that sometimes they need 2 sometimes 3 characters to get
>>> suggestions.
>>> 
>>> Thanks in advance,
>>> Roland
>>> 
>>

Re: Problem with solr suggester in case of non-ASCII characters

Posted by Szűcs Roland <sz...@bookandwalk.hu>.

Hi Furkan,

Thanks the suggestion, I always forget the most effective debugging tool
the analysis page.

It turned out that "Jó" was a stop word and it was eliminated during the
text analysis. What I will do is to create a new field type but without
stop word removal and I will use it like this:
<str
name="suggestAnalyzerFieldType">short_text_hu_without_stop_removal</str>

Thanks again

Roland

Furkan KAMACI <fu...@gmail.com> ezt írta (időpont: 2019. júl. 30.,
K, 16:17):

> Hi Roland,
>
> Could you check Analysis tab (
> https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell
> how
> the term is analyzed for both query and index?
>
> Kind Regards,
> Furkan KAMACI
>
> On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland <sz...@bookandwalk.hu>
> wrote:
>
> > Hi All,
> >
> > I have an author suggester (searchcomponent and the related request
> > handler) defined in solrconfig:
> > <searchComponent name="suggest" class="solr.SuggestComponent">
> >     <!-- All suggester component must have different filepath to avoid
> >     write lock issues-->>
> >     <lst name="suggester">
> >       <str name="name">author</str>
> >       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
> >       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> >       <str name="field">BOOK_productAuthor</str>
> >       <str name="suggestAnalyzerFieldType">short_text_hu</str>
> >       <str name="indexPath">suggester_infix_author</str>
> >       <str name="buildOnStartup">false</str>
> >       <str name="buildOnCommit">false</str>
> >       <str name="minPrefixChars">2</str>
> >     </lst>
> > </searchComponent>
> >
> > <requestHandler name="/suggesthandler" class="solr.SearchHandler"
> > startup="lazy" >
> > <lst name="defaults">
> >   <str name="suggest">true</str>
> >   <str name="suggest.count">10</str>
> >   <str name="suggest.dictionary">author</str>
> > </lst>
> > <arr name="components">
> >   <str>suggest</str>
> > </arr>
> > </requestHandler>
> >
> > Author field has just a minimal text processing in query and index time
> > based on the following definition:
> > <fieldType name="short_text_hu" class="solr.TextField"
> > positionIncrementGap="100" multiValued="true">
> >     <analyzer type="index">
> >       <charFilter class="solr.HTMLStripCharFilterFactory"/>
> >       <tokenizer class="solr.ClassicTokenizerFactory"/>
> >       <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
> > ignoreCase="true"/>
> >       <filter class="solr.LowerCaseFilterFactory"/>
> >     </analyzer>
> >     <analyzer type="query">
> >       <tokenizer class="solr.ClassicTokenizerFactory"/>
> >       <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
> > ignoreCase="true"/>
> >       <filter class="solr.LowerCaseFilterFactory"/>
> >     </analyzer>
> >   </fieldType>
> >   <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> > docValues="true"/>
> >   <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
> > docValues="true" multiValued="true"/>
> >   <fieldType name="text_ar" class="solr.TextField"
> > positionIncrementGap="100">
> >     <analyzer>
> >       <tokenizer class="solr.StandardTokenizerFactory"/>
> >       <filter class="solr.LowerCaseFilterFactory"/>
> >       <filter class="solr.StopFilterFactory"
> words="lang/stopwords_ar.txt"
> > ignoreCase="true"/>
> >       <filter class="solr.ArabicNormalizationFilterFactory"/>
> >       <filter class="solr.ArabicStemFilterFactory"/>
> >     </analyzer>
> >   </fieldType>
> >
> > When I use qeries with only ASCII characters, the results are correct:
> > "Al":{
> > "term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""}
> >
> > When I try it with Hungarian authorname with special character:
> > "Jó":"author":{
> > "Jó":{ "numFound":0, "suggestions":[]}}
> >
> > When I try it with three letters, it works again:
> > "Józ":"author":{
> > "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", "
> > weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "
> > payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0,
> "payload":""}, {
> > "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, {
> > "term":"<b>Józ</b>sef
> > Attila", "weight":0, "payload":""}..
> >
> > Any idea how can it happen that a longer string has more matches than a
> > shorter one. It is inconsistent. What can I do to fix it as it would
> > results poor customer experience.
> > They would feel that sometimes they need 2 sometimes 3 characters to get
> > suggestions.
> >
> > Thanks in advance,
> > Roland
> >
>

Re: Problem with solr suggester in case of non-ASCII characters

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi Roland,

Could you check Analysis tab (
https://lucene.apache.org/solr/guide/8_1/analysis-screen.html) and tell how
the term is analyzed for both query and index?

Kind Regards,
Furkan KAMACI

On Tue, Jul 30, 2019 at 4:50 PM Szűcs Roland <sz...@bookandwalk.hu>
wrote:

> Hi All,
>
> I have an author suggester (searchcomponent and the related request
> handler) defined in solrconfig:
> <searchComponent name="suggest" class="solr.SuggestComponent">
>     <!-- All suggester component must have different filepath to avoid
>     write lock issues-->>
>     <lst name="suggester">
>       <str name="name">author</str>
>       <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>       <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>       <str name="field">BOOK_productAuthor</str>
>       <str name="suggestAnalyzerFieldType">short_text_hu</str>
>       <str name="indexPath">suggester_infix_author</str>
>       <str name="buildOnStartup">false</str>
>       <str name="buildOnCommit">false</str>
>       <str name="minPrefixChars">2</str>
>     </lst>
> </searchComponent>
>
> <requestHandler name="/suggesthandler" class="solr.SearchHandler"
> startup="lazy" >
> <lst name="defaults">
>   <str name="suggest">true</str>
>   <str name="suggest.count">10</str>
>   <str name="suggest.dictionary">author</str>
> </lst>
> <arr name="components">
>   <str>suggest</str>
> </arr>
> </requestHandler>
>
> Author field has just a minimal text processing in query and index time
> based on the following definition:
> <fieldType name="short_text_hu" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">
>     <analyzer type="index">
>       <charFilter class="solr.HTMLStripCharFilterFactory"/>
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <tokenizer class="solr.ClassicTokenizerFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords_hu.txt"
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>   <fieldType name="string" class="solr.StrField" sortMissingLast="true"
> docValues="true"/>
>   <fieldType name="strings" class="solr.StrField" sortMissingLast="true"
> docValues="true" multiValued="true"/>
>   <fieldType name="text_ar" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>       <filter class="solr.StopFilterFactory" words="lang/stopwords_ar.txt"
> ignoreCase="true"/>
>       <filter class="solr.ArabicNormalizationFilterFactory"/>
>       <filter class="solr.ArabicStemFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> When I use qeries with only ASCII characters, the results are correct:
> "Al":{
> "term":"<b>Al</b>exandre Dumas", "weight":0, "payload":""}
>
> When I try it with Hungarian authorname with special character:
> "Jó":"author":{
> "Jó":{ "numFound":0, "suggestions":[]}}
>
> When I try it with three letters, it works again:
> "Józ":"author":{
> "Józ":{ "numFound":10, "suggestions":[{ "term":"Bajza <b>Józ</b>sef", "
> weight":0, "payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "
> payload":""}, { "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, {
> "term":"Eötvös <b>Józ</b>sef", "weight":0, "payload":""}, {
> "term":"<b>Józ</b>sef
> Attila", "weight":0, "payload":""}..
>
> Any idea how can it happen that a longer string has more matches than a
> shorter one. It is inconsistent. What can I do to fix it as it would
> results poor customer experience.
> They would feel that sometimes they need 2 sometimes 3 characters to get
> suggestions.
>
> Thanks in advance,
> Roland
>