You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Yutaka Nakajima <na...@gmail.com> on 2016/11/17 11:21:37 UTC

Question about synonym's behavior with NGramTokenizer

Hi,

I have a question about Solr synonym's behavior with NGramTokenizer.

I'm using below setting but does not work well. Synonyms doesn't work.
Please someone help me....

    <fieldType name="text_2gram_n_i" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="false">
      <analyzer type="index">
        <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
maxGramSize="2"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms-index.txt"
                tokenizerFactory="solr.NGramTokenizerFactory"
                tokenizerFactory.minGramSize="2"
                tokenizerFactory.maxGramSize="2"
                luceneMatchVersion="3.3"
                ignoreCase="true" expand="true"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
maxGramSize="2"/>
      </analyzer>
    </fieldType>

Thanks,
Yutaka Nakajima

Re: Question about synonym's behavior with NGramTokenizer

Posted by Erick Erickson <er...@gmail.com>.

Wouldn't it be better to put the synonym filter in front of the
NGramTokenizerFactory and just let the SynonymFilter take care of
ngramming the injected tokens just like the other tokens like this?

<filter class="solr.SynonymFilterFactory"
synonyms="synonyms-index.txt"
                ignoreCase="true" expand="true"/>

  <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
maxGramSize="2"/>

That said, I urge you to use the admin/anlysis page ot see the effects
of various tweaks you can do to the analysis chain, it'll help make
sense of all the interactions. Hint: Unless you care to see _lots_ lf
detail, uncheck the "verbose" checkbox....

Also, please describe exactly _what_ doesn't work. We need to know
what behavior you expect, what behavior you're seeing and, if
possible, some example data, queries and results you'd like to see.

Best,
Erick

Best,
Erick

On Thu, Nov 17, 2016 at 3:21 AM, Yutaka Nakajima <na...@gmail.com> wrote:
> Hi,
>
> I have a question about Solr synonym's behavior with NGramTokenizer.
>
> I'm using below setting but does not work well. Synonyms doesn't work.
> Please someone help me....
>
>     <fieldType name="text_2gram_n_i" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="false">
>       <analyzer type="index">
>         <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
> maxGramSize="2"/>
>         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms-index.txt"
>                 tokenizerFactory="solr.NGramTokenizerFactory"
>                 tokenizerFactory.minGramSize="2"
>                 tokenizerFactory.maxGramSize="2"
>                 luceneMatchVersion="3.3"
>                 ignoreCase="true" expand="true"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.NGramTokenizerFactory" minGramSize="2"
> maxGramSize="2"/>
>       </analyzer>
>     </fieldType>
>
> Thanks,
> Yutaka Nakajima