You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sunil Srinivasan <su...@aol.in> on 2019/07/18 03:44:06 UTC

Solr edismax parser with multi-word synonyms

I have enabled the SynonymGraphFilter in my field configuration in order to support multi-word synonyms (I am using Solr 7.6). Here is my field configuration:
<fieldType name="text_syn" class="solr.TextField">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
    </analyzer>

    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true" synonyms="synonyms.txt"/>
    </analyzer>
</fieldType>

<field name="title" type="text_syn" indexed="true">

And this is my synonyms.txt file:
frozen dinner,microwave food

Scenario 1: blue shirt (query with no synonyms)

Here is my first Solr query:
http://localhost:8983/solr/base/search?q=blue+shirt&qf=title&defType=edismax&debugQuery=on

And this is the parsed query I see in the debug output:
+((title:blue) (title:shirt))

Scenario 2: frozen dinner (query with synonyms)

Now, here is my second Solr query:
http://localhost:8983/solr/base/search?q=frozen+dinner&qf=title&defType=edismax&debugQuery=on

And this is the parsed query I see in the debug output:
+(((+title:microwave +title:food) (+title:frozen +title:dinner)))

I am wondering why the first query looks for documents containing at least one of the two query tokens, whereas the second query looks for documents with both of the query tokens? I would understand if it looked for both the tokens of the synonyms (i.e. both microwave and food) to avoid the sausagization problem. But I would like to get partial matches on the original query at least (i.e. it should also match documents containing just the token 'dinner').

Would any one know why the behavior is different across queries with and without synonyms? And how could I work around this if I wanted partial matches on queries that also have synonyms?

Ideally, I would like the parsed query in the second case to be:
+(((+title:microwave +title:food) (title:frozen title:dinner)))

I'd appreciate any help with this. Thanks!

Re: Solr edismax parser with multi-word synonyms

Posted by Erick Erickson <er...@gmail.com>.
This is not a phrase query, rather it’s requiring either pair of words
to appear in the title.

You’ve told it that “frozen dinner” and “microwave foods” are synonyms. 
So it’s looking for both the words “microwave” and “foods” in the title field, 
or “frozen” and “dinner” in the title field.

You’d see the same thing with single-word synonyms, albeit a little less
confusingly.


Best,
Erick


> On Jul 18, 2019, at 1:01 AM, kshitij tyagi <ks...@gmail.com> wrote:
> 
> Hi sunil,
> 
> 1. as you have added "microwave food" in synonym as a multiword synonym to
> "frozen dinner", edismax parsers finds your synonym in the file and is
> considering your query as a Phrase query.
> 
> This is the reason you are seeing parsed query as  +(((+title:microwave
> +title:food) (+title:frozen +title:dinner))), frozen dinner is considered
> as a phrase here.
> 
> If you want partial match on your query then you can add frozen dinner,
> microwave food, microwave, food to your synonym file and you will see the
> parsed query as:
> "+(((+title:microwave +title:food) title:miccrowave title:food
> (+title:frozen +title:dinner)))"
> Another option is to write your own custom query parser and use it as a
> plugin.
> 
> Hope this helps!!
> 
> kshitij
> 
> 
> On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan <su...@aol.in> wrote:
> 
>> 
>> I have enabled the SynonymGraphFilter in my field configuration in order
>> to support multi-word synonyms (I am using Solr 7.6). Here is my field
>> configuration:
>> <fieldType name="text_syn" class="solr.TextField">
>>    <analyzer type="index">
>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>    </analyzer>
>> 
>>    <analyzer type="query">
>>      <tokenizer class="solr.StandardTokenizerFactory"/>
>>      <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true"
>> synonyms="synonyms.txt"/>
>>    </analyzer>
>> </fieldType>
>> 
>> <field name="title" type="text_syn" indexed="true">
>> 
>> And this is my synonyms.txt file:
>> frozen dinner,microwave food
>> 
>> Scenario 1: blue shirt (query with no synonyms)
>> 
>> Here is my first Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=blue+shirt&qf=title&defType=edismax&debugQuery=on
>> 
>> And this is the parsed query I see in the debug output:
>> +((title:blue) (title:shirt))
>> 
>> Scenario 2: frozen dinner (query with synonyms)
>> 
>> Now, here is my second Solr query:
>> 
>> http://localhost:8983/solr/base/search?q=frozen+dinner&qf=title&defType=edismax&debugQuery=on
>> 
>> And this is the parsed query I see in the debug output:
>> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>> 
>> I am wondering why the first query looks for documents containing at least
>> one of the two query tokens, whereas the second query looks for documents
>> with both of the query tokens? I would understand if it looked for both the
>> tokens of the synonyms (i.e. both microwave and food) to avoid the
>> sausagization problem. But I would like to get partial matches on the
>> original query at least (i.e. it should also match documents containing
>> just the token 'dinner').
>> 
>> Would any one know why the behavior is different across queries with and
>> without synonyms? And how could I work around this if I wanted partial
>> matches on queries that also have synonyms?
>> 
>> Ideally, I would like the parsed query in the second case to be:
>> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>> 
>> I'd appreciate any help with this. Thanks!
>> 


Re: Solr edismax parser with multi-word synonyms

Posted by kshitij tyagi <ks...@gmail.com>.
Hi sunil,

1. as you have added "microwave food" in synonym as a multiword synonym to
"frozen dinner", edismax parsers finds your synonym in the file and is
considering your query as a Phrase query.

This is the reason you are seeing parsed query as  +(((+title:microwave
+title:food) (+title:frozen +title:dinner))), frozen dinner is considered
as a phrase here.

If you want partial match on your query then you can add frozen dinner,
microwave food, microwave, food to your synonym file and you will see the
parsed query as:
"+(((+title:microwave +title:food) title:miccrowave title:food
(+title:frozen +title:dinner)))"
 Another option is to write your own custom query parser and use it as a
plugin.

Hope this helps!!

kshitij


On Thu, Jul 18, 2019 at 9:14 AM Sunil Srinivasan <su...@aol.in> wrote:

>
> I have enabled the SynonymGraphFilter in my field configuration in order
> to support multi-word synonyms (I am using Solr 7.6). Here is my field
> configuration:
> <fieldType name="text_syn" class="solr.TextField">
>     <analyzer type="index">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>     </analyzer>
>
>     <analyzer type="query">
>       <tokenizer class="solr.StandardTokenizerFactory"/>
>       <filter class="solr.SynonymGraphFilterFactory" ignoreCase="true"
> synonyms="synonyms.txt"/>
>     </analyzer>
> </fieldType>
>
> <field name="title" type="text_syn" indexed="true">
>
> And this is my synonyms.txt file:
> frozen dinner,microwave food
>
> Scenario 1: blue shirt (query with no synonyms)
>
> Here is my first Solr query:
>
> http://localhost:8983/solr/base/search?q=blue+shirt&qf=title&defType=edismax&debugQuery=on
>
> And this is the parsed query I see in the debug output:
> +((title:blue) (title:shirt))
>
> Scenario 2: frozen dinner (query with synonyms)
>
> Now, here is my second Solr query:
>
> http://localhost:8983/solr/base/search?q=frozen+dinner&qf=title&defType=edismax&debugQuery=on
>
> And this is the parsed query I see in the debug output:
> +(((+title:microwave +title:food) (+title:frozen +title:dinner)))
>
> I am wondering why the first query looks for documents containing at least
> one of the two query tokens, whereas the second query looks for documents
> with both of the query tokens? I would understand if it looked for both the
> tokens of the synonyms (i.e. both microwave and food) to avoid the
> sausagization problem. But I would like to get partial matches on the
> original query at least (i.e. it should also match documents containing
> just the token 'dinner').
>
> Would any one know why the behavior is different across queries with and
> without synonyms? And how could I work around this if I wanted partial
> matches on queries that also have synonyms?
>
> Ideally, I would like the parsed query in the second case to be:
> +(((+title:microwave +title:food) (title:frozen title:dinner)))
>
> I'd appreciate any help with this. Thanks!
>