You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Szűcs Roland <sz...@bookandwalk.hu> on 2020/03/28 16:29:10 UTC

spellcheccker offers less alternatives

Hi All,

My question is that it is a feature or bug in solr spellchecker with the
default distance measure with maxedits 2:
A multiValued field includes:"József" and it's ASCIIfolding filtered
version "Jozsef" to support mobile search where users usually do not waste
of time to type József.
When I make a query with spellcheck.q=Józzef then interestingly I got back
only Jozsef as an alternative.

Is it normal that in case of multiValued fields only one term is returned?

Secondly, I tried collations by spellcheck.q="Józzef Atila" where the real
author field includes either József Attila or Jozsef Attila.

I got suggestion for Józzef like before and for Atila I got
correctly Attila but I always get collations null in solrj with Solr 8.4.1.
Here is my relevant solrconfig:
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">textSpell</str>
      <str name="queryAnalyzerFieldType">shortTextSpell</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
      <float name="accuracy">0.5</float>
      <int name="maxEdits">2</int>
      <int name="minPrefix">2</int>
      <int name="maxInspections">5</int>
      <int name="minQueryLength">4</int>
      <float name="maxQueryFrequency">0.01</float>
    </lst>

schema:
<fieldType name="shortTextSpell" class="solr.TextField"
positionIncrementGap="100" stored="false" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_hu.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_hu.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"  />
</analyzer>
</fieldType>

Thanks in advance,
Roland