You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by atin janki <at...@gmail.com> on 2020/03/16 14:46:58 UTC

Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Hello everyone,

I am using solr 8.3.

After I included Synonym Graph Filter in my managed-schema file, I
have noticed that if the query string contains a multi-word synonym,
it considers that multi-word synonym as a single term and does not
break it, further suppressing the default search behaviour.

I am using StandardTokenizer.

Below is a snippet from managed-schema file -

>
> *  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">*
> *    <analyzer type="index">*
> *      <tokenizer class="solr.StandardTokenizerFactory"/>*
> *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
> *      <filter class="solr.LowerCaseFilterFactory"/>*
> *    </analyzer>*
> *    <analyzer type="query">*
> *      <tokenizer class="solr.StandardTokenizerFactory"/>*
> *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
> *      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>*
> *      <filter class="solr.LowerCaseFilterFactory"/>*
> *    </analyzer>**  </fieldType>*


Here "*soap powder*" is the search *query* which is also a multi-word
synonym in the synonym file as-

> s(104254535,1,'soap powder',n,1,1).
> s(104254535,2,'built-soap powder',n,1,0).
> s(104254535,3,'washing powder',n,1,0).


I am sharing some screenshots for understanding the problem-

*without* Synonym Graph Filter => 2 docs returned  (screenshot at
below mentioned URL) -

https://ibb.co/zQXx7mV

*with* Synonym Graph Filter => 2 docs expected, only 1 returned
(screenshot at below mentioned URL) -

https://ibb.co/tp04Rzw


Has anyone experienced this before? If yes, is there any workaround ?
Or is it an expected behaviour?

Regards,
Atin Janki

Re: Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.
I don't think you can synonym-ize both the multi-token phrase and each individual token in the multi-token phrase at the same time. But anyone else feel free to chime in! 

Best,
Audrey Lorberfeld

On 3/16/20, 12:40 PM, "atin janki" <at...@gmail.com> wrote:

    I aim to achieve an expansion like -
    
    Synonym(soap powder) + Synonym(soap) + Synonym (powder)
    
    
    which is not happening because of the Synonym expansion is being done at
    the moment.
    
    At the moment, using  Synonym Graph Filter with StandardTokenizer  and sow
    = false , expands as -
    
     Synonym(soap powder)
    
    because "soap powder" is a multi-word synonym present in the synonym file.
    
    Using sow = true in the above setting will give -
    
    Synonym(soap) + Synonym (powder)
    
    
    
    Best Regards,
    Atin Janki
    
    
    On Mon, Mar 16, 2020 at 5:27 PM Audrey Lorberfeld -
    Audrey.Lorberfeld@ibm.com <Au...@ibm.com> wrote:
    
    > To confirm, you want a synonym like "soap powder" to map onto synonyms
    > like "hand soap," "hygiene products," etc? As in, more of a cognitive
    > synonym mapping where you feed synonyms that only apply to the multi-token
    > phrase as a whole?
    >
    > On 3/16/20, 12:17 PM, "atin janki" <at...@gmail.com> wrote:
    >
    >     Using sow=true, does split the word on whitespaces but it will not
    > look for
    >     synonyms of "soap powder" anymore, rather it expands separate synonyms
    > for
    >     "soap" and "powder".
    >
    >
    >
    >     Best Regards,
    >     Atin Janki
    >
    >
    >     On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
    >     Audrey.Lorberfeld@ibm.com <Au...@ibm.com> wrote:
    >
    >     > Have you set sow=true in your search handler? I know that we have it
    > set
    >     > to false (sow = split on whitespace) because we WANT multi-token
    > synonyms
    >     > retained as multiple tokens.
    >     >
    >     > On 3/16/20, 10:49 AM, "atin janki" <at...@gmail.com> wrote:
    >     >
    >     >     Hello everyone,
    >     >
    >     >     I am using solr 8.3.
    >     >
    >     >     After I included Synonym Graph Filter in my managed-schema file,
    > I
    >     >     have noticed that if the query string contains a multi-word
    > synonym,
    >     >     it considers that multi-word synonym as a single term and does
    > not
    >     >     break it, further suppressing the default search behaviour.
    >     >
    >     >     I am using StandardTokenizer.
    >     >
    >     >     Below is a snippet from managed-schema file -
    >     >
    >     >     >
    >     >     > *  <fieldType name="text_general" class="solr.TextField"
    >     > positionIncrementGap="100" multiValued="true">*
    >     >     > *    <analyzer type="index">*
    >     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     >     > *      <filter class="solr.StopFilterFactory"
    > words="stopwords.txt"
    >     > ignoreCase="true"/>*
    >     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     >     > *    </analyzer>*
    >     >     > *    <analyzer type="query">*
    >     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     >     > *      <filter class="solr.StopFilterFactory"
    > words="stopwords.txt"
    >     > ignoreCase="true"/>*
    >     >     > *      <filter class="solr.SynonymGraphFilterFactory"
    > expand="true"
    >     > ignoreCase="true" synonyms="synonyms.txt"/>*
    >     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     >     > *    </analyzer>**  </fieldType>*
    >     >
    >     >
    >     >     Here "*soap powder*" is the search *query* which is also a
    > multi-word
    >     >     synonym in the synonym file as-
    >     >
    >     >     > s(104254535,1,'soap powder',n,1,1).
    >     >     > s(104254535,2,'built-soap powder',n,1,0).
    >     >     > s(104254535,3,'washing powder',n,1,0).
    >     >
    >     >
    >     >     I am sharing some screenshots for understanding the problem-
    >     >
    >     >     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
    >     >     below mentioned URL) -
    >     >
    >     >
    >     >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
    >     >
    >     >     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
    >     >     (screenshot at below mentioned URL) -
    >     >
    >     >
    >     >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
    >     >
    >     >
    >     >     Has anyone experienced this before? If yes, is there any
    > workaround ?
    >     >     Or is it an expected behaviour?
    >     >
    >     >     Regards,
    >     >     Atin Janki
    >     >
    >     >
    >     >
    >
    >
    >
    


Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Posted by atin janki <at...@gmail.com>.
I aim to achieve an expansion like -

Synonym(soap powder) + Synonym(soap) + Synonym (powder)


which is not happening because of the Synonym expansion is being done at
the moment.

At the moment, using  Synonym Graph Filter with StandardTokenizer  and sow
= false , expands as -

 Synonym(soap powder)

because "soap powder" is a multi-word synonym present in the synonym file.

Using sow = true in the above setting will give -

Synonym(soap) + Synonym (powder)



Best Regards,
Atin Janki


On Mon, Mar 16, 2020 at 5:27 PM Audrey Lorberfeld -
Audrey.Lorberfeld@ibm.com <Au...@ibm.com> wrote:

> To confirm, you want a synonym like "soap powder" to map onto synonyms
> like "hand soap," "hygiene products," etc? As in, more of a cognitive
> synonym mapping where you feed synonyms that only apply to the multi-token
> phrase as a whole?
>
> On 3/16/20, 12:17 PM, "atin janki" <at...@gmail.com> wrote:
>
>     Using sow=true, does split the word on whitespaces but it will not
> look for
>     synonyms of "soap powder" anymore, rather it expands separate synonyms
> for
>     "soap" and "powder".
>
>
>
>     Best Regards,
>     Atin Janki
>
>
>     On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
>     Audrey.Lorberfeld@ibm.com <Au...@ibm.com> wrote:
>
>     > Have you set sow=true in your search handler? I know that we have it
> set
>     > to false (sow = split on whitespace) because we WANT multi-token
> synonyms
>     > retained as multiple tokens.
>     >
>     > On 3/16/20, 10:49 AM, "atin janki" <at...@gmail.com> wrote:
>     >
>     >     Hello everyone,
>     >
>     >     I am using solr 8.3.
>     >
>     >     After I included Synonym Graph Filter in my managed-schema file,
> I
>     >     have noticed that if the query string contains a multi-word
> synonym,
>     >     it considers that multi-word synonym as a single term and does
> not
>     >     break it, further suppressing the default search behaviour.
>     >
>     >     I am using StandardTokenizer.
>     >
>     >     Below is a snippet from managed-schema file -
>     >
>     >     >
>     >     > *  <fieldType name="text_general" class="solr.TextField"
>     > positionIncrementGap="100" multiValued="true">*
>     >     > *    <analyzer type="index">*
>     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     >     > *      <filter class="solr.StopFilterFactory"
> words="stopwords.txt"
>     > ignoreCase="true"/>*
>     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     >     > *    </analyzer>*
>     >     > *    <analyzer type="query">*
>     >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     >     > *      <filter class="solr.StopFilterFactory"
> words="stopwords.txt"
>     > ignoreCase="true"/>*
>     >     > *      <filter class="solr.SynonymGraphFilterFactory"
> expand="true"
>     > ignoreCase="true" synonyms="synonyms.txt"/>*
>     >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     >     > *    </analyzer>**  </fieldType>*
>     >
>     >
>     >     Here "*soap powder*" is the search *query* which is also a
> multi-word
>     >     synonym in the synonym file as-
>     >
>     >     > s(104254535,1,'soap powder',n,1,1).
>     >     > s(104254535,2,'built-soap powder',n,1,0).
>     >     > s(104254535,3,'washing powder',n,1,0).
>     >
>     >
>     >     I am sharing some screenshots for understanding the problem-
>     >
>     >     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
>     >     below mentioned URL) -
>     >
>     >
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
>     >
>     >     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
>     >     (screenshot at below mentioned URL) -
>     >
>     >
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
>     >
>     >
>     >     Has anyone experienced this before? If yes, is there any
> workaround ?
>     >     Or is it an expected behaviour?
>     >
>     >     Regards,
>     >     Atin Janki
>     >
>     >
>     >
>
>
>

Re: Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.
To confirm, you want a synonym like "soap powder" to map onto synonyms like "hand soap," "hygiene products," etc? As in, more of a cognitive synonym mapping where you feed synonyms that only apply to the multi-token phrase as a whole?

On 3/16/20, 12:17 PM, "atin janki" <at...@gmail.com> wrote:

    Using sow=true, does split the word on whitespaces but it will not look for
    synonyms of "soap powder" anymore, rather it expands separate synonyms for
    "soap" and "powder".
    
    
    
    Best Regards,
    Atin Janki
    
    
    On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
    Audrey.Lorberfeld@ibm.com <Au...@ibm.com> wrote:
    
    > Have you set sow=true in your search handler? I know that we have it set
    > to false (sow = split on whitespace) because we WANT multi-token synonyms
    > retained as multiple tokens.
    >
    > On 3/16/20, 10:49 AM, "atin janki" <at...@gmail.com> wrote:
    >
    >     Hello everyone,
    >
    >     I am using solr 8.3.
    >
    >     After I included Synonym Graph Filter in my managed-schema file, I
    >     have noticed that if the query string contains a multi-word synonym,
    >     it considers that multi-word synonym as a single term and does not
    >     break it, further suppressing the default search behaviour.
    >
    >     I am using StandardTokenizer.
    >
    >     Below is a snippet from managed-schema file -
    >
    >     >
    >     > *  <fieldType name="text_general" class="solr.TextField"
    > positionIncrementGap="100" multiValued="true">*
    >     > *    <analyzer type="index">*
    >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
    > ignoreCase="true"/>*
    >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     > *    </analyzer>*
    >     > *    <analyzer type="query">*
    >     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    >     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
    > ignoreCase="true"/>*
    >     > *      <filter class="solr.SynonymGraphFilterFactory" expand="true"
    > ignoreCase="true" synonyms="synonyms.txt"/>*
    >     > *      <filter class="solr.LowerCaseFilterFactory"/>*
    >     > *    </analyzer>**  </fieldType>*
    >
    >
    >     Here "*soap powder*" is the search *query* which is also a multi-word
    >     synonym in the synonym file as-
    >
    >     > s(104254535,1,'soap powder',n,1,1).
    >     > s(104254535,2,'built-soap powder',n,1,0).
    >     > s(104254535,3,'washing powder',n,1,0).
    >
    >
    >     I am sharing some screenshots for understanding the problem-
    >
    >     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
    >     below mentioned URL) -
    >
    >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
    >
    >     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
    >     (screenshot at below mentioned URL) -
    >
    >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
    >
    >
    >     Has anyone experienced this before? If yes, is there any workaround ?
    >     Or is it an expected behaviour?
    >
    >     Regards,
    >     Atin Janki
    >
    >
    >
    


Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Posted by atin janki <at...@gmail.com>.
Using sow=true, does split the word on whitespaces but it will not look for
synonyms of "soap powder" anymore, rather it expands separate synonyms for
"soap" and "powder".



Best Regards,
Atin Janki


On Mon, Mar 16, 2020 at 4:59 PM Audrey Lorberfeld -
Audrey.Lorberfeld@ibm.com <Au...@ibm.com> wrote:

> Have you set sow=true in your search handler? I know that we have it set
> to false (sow = split on whitespace) because we WANT multi-token synonyms
> retained as multiple tokens.
>
> On 3/16/20, 10:49 AM, "atin janki" <at...@gmail.com> wrote:
>
>     Hello everyone,
>
>     I am using solr 8.3.
>
>     After I included Synonym Graph Filter in my managed-schema file, I
>     have noticed that if the query string contains a multi-word synonym,
>     it considers that multi-word synonym as a single term and does not
>     break it, further suppressing the default search behaviour.
>
>     I am using StandardTokenizer.
>
>     Below is a snippet from managed-schema file -
>
>     >
>     > *  <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100" multiValued="true">*
>     > *    <analyzer type="index">*
>     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>*
>     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     > *    </analyzer>*
>     > *    <analyzer type="query">*
>     > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
>     > *      <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>*
>     > *      <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>*
>     > *      <filter class="solr.LowerCaseFilterFactory"/>*
>     > *    </analyzer>**  </fieldType>*
>
>
>     Here "*soap powder*" is the search *query* which is also a multi-word
>     synonym in the synonym file as-
>
>     > s(104254535,1,'soap powder',n,1,1).
>     > s(104254535,2,'built-soap powder',n,1,0).
>     > s(104254535,3,'washing powder',n,1,0).
>
>
>     I am sharing some screenshots for understanding the problem-
>
>     *without* Synonym Graph Filter => 2 docs returned  (screenshot at
>     below mentioned URL) -
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e=
>
>     *with* Synonym Graph Filter => 2 docs expected, only 1 returned
>     (screenshot at below mentioned URL) -
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e=
>
>
>     Has anyone experienced this before? If yes, is there any workaround ?
>     Or is it an expected behaviour?
>
>     Regards,
>     Atin Janki
>
>
>

Re: Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.
Have you set sow=true in your search handler? I know that we have it set to false (sow = split on whitespace) because we WANT multi-token synonyms retained as multiple tokens. 

On 3/16/20, 10:49 AM, "atin janki" <at...@gmail.com> wrote:

    Hello everyone,
    
    I am using solr 8.3.
    
    After I included Synonym Graph Filter in my managed-schema file, I
    have noticed that if the query string contains a multi-word synonym,
    it considers that multi-word synonym as a single term and does not
    break it, further suppressing the default search behaviour.
    
    I am using StandardTokenizer.
    
    Below is a snippet from managed-schema file -
    
    >
    > *  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">*
    > *    <analyzer type="index">*
    > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    > *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
    > *      <filter class="solr.LowerCaseFilterFactory"/>*
    > *    </analyzer>*
    > *    <analyzer type="query">*
    > *      <tokenizer class="solr.StandardTokenizerFactory"/>*
    > *      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>*
    > *      <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>*
    > *      <filter class="solr.LowerCaseFilterFactory"/>*
    > *    </analyzer>**  </fieldType>*
    
    
    Here "*soap powder*" is the search *query* which is also a multi-word
    synonym in the synonym file as-
    
    > s(104254535,1,'soap powder',n,1,1).
    > s(104254535,2,'built-soap powder',n,1,0).
    > s(104254535,3,'washing powder',n,1,0).
    
    
    I am sharing some screenshots for understanding the problem-
    
    *without* Synonym Graph Filter => 2 docs returned  (screenshot at
    below mentioned URL) -
    
    https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_zQXx7mV&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=QUaaR69psn7pqa3DtaC7MrTMFstQrQHgeuY0qeQTc0k&e= 
    
    *with* Synonym Graph Filter => 2 docs expected, only 1 returned
    (screenshot at below mentioned URL) -
    
    https://urldefense.proofpoint.com/v2/url?u=https-3A__ibb.co_tp04Rzw&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=20lvJFDIjFQqyiTdHseNNeSlDRT2YSznQPoQnxGJQfM&s=pLPVuD71W1IhokvFuu4F672lX8Nk07b0X9pCVETRjks&e= 
    
    
    Has anyone experienced this before? If yes, is there any workaround ?
    Or is it an expected behaviour?
    
    Regards,
    Atin Janki
    


Using Synonym Graph Filter with StandardTokenizer does not tokenize the query string if it has multi-word synonym

Posted by atin janki <at...@gmail.com>.
Hello everyone,

I am using solr 8.3.

After I included Synonym Graph Filter in my managed-schema file, I have
noticed that if the query string contains a multi-word synonym, it
considers that multi-word synonym as a single term and does not break it,
further suppressing the default search behaviour.

I am using StandardTokenizer.

Below is a snippet from managed-schema file -

  <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>


Here "soap powder" is the search query which is also a multi-word synonym
in the synonym file as-

s(104254535,1,'soap powder',n,1,1).
s(104254535,2,'built-soap powder',n,1,0).
s(104254535,3,'washing powder',n,1,0).

I am sharing some screenshots for understanding the problem-

without Synonym Graph Filter => 2 docs returned (screenshot at below
mentioned URL) -

https://ibb.co/zQXx7mV

with Synonym Graph Filter => 2 docs expected, only 1 returned (screenshot
at below mentioned URL) -

https://ibb.co/tp04Rzw



Has anyone experienced this before? If yes, is there any workaround ?

Or is it an expected behaviour?

Regards,
Atin Janki

>