You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Anupam Bhattacharya <an...@gmail.com> on 2018/02/02 07:34:50 UTC

No Suggestions from SpellCheck when _text_ field tokenizer set to solr.NGramTokenizerFactory

I have configured Solr Managed-schema as following

Below configuration is for Full Text Search:

<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
      <!-- <tokenizer class="solr.StandardTokenizerFactory"/> -->
      <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
maxGramSize="10"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <!-- <tokenizer class="solr.StandardTokenizerFactory"/> -->
      <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
maxGramSize="10"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

Following is the configuration for Spell check type of field.

<fieldType name="text_spellcheck" class="solr.TextField"
positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

Below is the field on which I do spell check

<field name="title_txt_spellcheck_ja" type="text_spellcheck"
omitNorms="true" indexed="true" stored="true"/>

Below is the full text field
<field name="_text_" type="text_general" multiValued="true" indexed="true"
stored="false"/>

I copy text from another field for suggestions.

<copyField source="title_txt_ja" dest="title_txt_spellcheck_ja"/>

/spell?fl=id,title_txt_spellcheck_ja&wt=json&defType=edismax&q=te&spellcheck=on&spellcheck.count=10&spellcheck.collate=true&spellcheck.dictionary=title_txt_spellcheck_ja&spellcheck.collateExtendedResults=true&spellcheck.maxCollations=3

All the configurations were working fine till i changed
<tokenizer class="solr.StandardTokenizerFactory"/>  to
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
maxGramSize="10"/>
in text_general field type.

Any clues ?

Regards
Anupam Bhattacharya

Re: No Suggestions from SpellCheck when _text_ field tokenizer set to solr.NGramTokenizerFactory

Posted by Alessandro Benedetti <a....@sease.io>.

How is this field type defined : textSpell ?
Can you detail what it is not working as expected ?

Thanks



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: No Suggestions from SpellCheck when _text_ field tokenizer set to solr.NGramTokenizerFactory

Posted by Anupam Bhattacharya <an...@gmail.com>.

Following is the configuration related to Spell check in the
Solr-config.xml file.

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
      <str name="queryAnalyzerFieldType">textSpell</str>
      <lst name="spellchecker">
          <str name="name">title_txt_spellcheck_ja</str>
          <str name="field">title_txt_spellcheck_ja</str>
          <str name="buildOnOptimize">true</str>
          <str name="buildOnCommit">true</str>
          <str name="spellcheckIndexDir">./spellchecker_en</str>
      </lst>
      <lst name="spellchecker">
          <str name="name">title_txt_spellcheck_en</str>
          <str name="field">title_txt_spellcheck_en</str>
          <str name="buildOnOptimize">true</str>
          <str name="buildOnCommit">true</str>
          <str name="spellcheckIndexDir">./spellchecker_de</str>
      </lst>
      ............
      ............
</searchComponent>

  <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <!-- Solr will use suggestions from both the 'default' spellchecker
           and from the 'wordbreak' spellchecker and combine them.
           collations (re-written queries) can include a combination of
           corrections from both spellcheckers -->
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>

Regards,
Anupam

On Mon, Feb 5, 2018 at 5:45 PM, Alessandro Benedetti <a....@sease.io>
wrote:

> Hi, how is your spellcheck dictionary :
> "spellcheck.dictionary=title_txt_spellcheck_ja" defined in the
> solrconfig.xml?
>
> Regards
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Thanks & Regards
Anupam Bhattacharya

Re: No Suggestions from SpellCheck when _text_ field tokenizer set to solr.NGramTokenizerFactory

Posted by Alessandro Benedetti <a....@sease.io>.

Hi, how is your spellcheck dictionary :
"spellcheck.dictionary=title_txt_spellcheck_ja" defined in the
solrconfig.xml?

Regards



-----
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: No Suggestions from SpellCheck when _text_ field tokenizer set to solr.NGramTokenizerFactory

Posted by Anupam Bhattacharya <an...@gmail.com>.

Pls. help to understand the root cause of this behavior.

Even though text_spellcheck fieldType is using solr.StandardTokenizerFactory
& doesnt have any relation with text_general field which is using
solr.NGramTokenizerFactory
tokenizer why the Solr Spell check services is not working as expected.

Regards,
Anupam

On Fri, Feb 2, 2018 at 1:04 PM, Anupam Bhattacharya <an...@gmail.com>
wrote:

> I have configured Solr Managed-schema as following
>
> Below configuration is for Full Text Search:
>
> <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer type="index">
>       <!-- <tokenizer class="solr.StandardTokenizerFactory"/> -->
>       <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
> maxGramSize="10"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>       <!-- <tokenizer class="solr.StandardTokenizerFactory"/> -->
>       <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
> maxGramSize="10"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> Following is the configuration for Spell check type of field.
>
> <fieldType name="text_spellcheck" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer>
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.StandardFilterFactory"/>
>       <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>       <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
>   </fieldType>
>
> Below is the field on which I do spell check
>
> <field name="title_txt_spellcheck_ja" type="text_spellcheck"
> omitNorms="true" indexed="true" stored="true"/>
>
> Below is the full text field
> <field name="_text_" type="text_general" multiValued="true" indexed="true"
> stored="false"/>
>
> I copy text from another field for suggestions.
>
> <copyField source="title_txt_ja" dest="title_txt_spellcheck_ja"/>
>
> /spell?fl=id,title_txt_spellcheck_ja&wt=json&defType=
> edismax&q=te&spellcheck=on&spellcheck.count=10&spellcheck.collate=true&
> spellcheck.dictionary=title_txt_spellcheck_ja&spellcheck.
> collateExtendedResults=true&spellcheck.maxCollations=3
>
> All the configurations were working fine till i changed
> <tokenizer class="solr.StandardTokenizerFactory"/>  to
> <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
> maxGramSize="10"/>
> in text_general field type.
>
> Any clues ?
>
> Regards
> Anupam Bhattacharya
>
>


-- 
Thanks & Regards
Anupam Bhattacharya