You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Andrea Gazzarini <a....@sease.io> on 2018/07/26 12:41:11 UTC

Synonyms + autoGeneratePhraseQueries

Hi, still fighting with synonyms, I have another question.

I'm not understanding the role, and the effect, of the 
"autoGeneratePhraseQueries" attribute in a synonym context.
I mean, if I have the following field type:

<fieldtype name="custom_text" class="solr.TextField" 
autoGeneratePhraseQueries="true">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" 
ignoreCase="false" expand="true"/>
        </analyzer>
</fieldtype>

with the following synonym: *out of warranty,oow*

with the following query: *q=out of warranty*

The output query is exactly what I would expect: *(title:oow 
PhraseQuery(title:"out of warranty"))*

Setting the autoGeneratePhraseQueries to *false* (or better, forgetting 
the attribute declaration at all), the output query is:

*(title:oow (+title:out +title:of +title:warranty))*

Which matches things like "I had to step out for renewing the warranty 
of my device".

This, at first glance sounds to me completely wrong. Or, better, I'm not 
able to imagine a use case where that synonym decomposition could be 
useful. Is that wanted? I would say that the query parser should always 
generates a phrase query for multi-term synonyms, like in the first 
example (i.e. autoGeneratePhraseQueries=true).

Thanks in advance,
Andrea

Re: Synonyms + autoGeneratePhraseQueries

Posted by Andrea Gazzarini <a....@sease.io>.
Hi Michael,
the synonym is expanded at query time so the resulting query is the same 
regardless if q=oow or q=out of warranty:

1) autogeneratePhraseQueries=*true*, sow=false, df=title

q=out of warranty => (title:oow PhraseQuery(title:"out of warranty"))
q=oow                   =>(PhraseQuery(title:"out of warranty") title:oow)

2) autogeneratePhraseQueries=*false (or missing)*, sow=false, df=title

q=out of warranty => ((+title:out +title:of +title:warranty) title:oow)
q=oow                   =>(title:oow (+title:out +title:of +title:warranty))

As you can see clauses are inverted but the resulting queries are 
equivalent.

Sorry I forgot a couple of info:

  * I tried with Solr 7.1.0 and 7.4.0
  * I'm using the following request handler, but things are no different
    if I switch to edismax

<requestHandler name="/def" class="solr.SearchHandler" default="true">
        <lst name="defaults">
            <bool name="sow">false</bool>
            <str name="df">title</str>
            <str name="defType">lucene</str>
            <bool name="debug">true</bool>
        </lst>
    </requestHandler>



On 27/07/18 04:32, Michael Sokolov wrote:
> Did you mean q=oow in your example? As written, I don't see how there 
> is a problem.
>
> On Thu, Jul 26, 2018 at 8:41 AM Andrea Gazzarini <a.gazzarini@sease.io 
> <ma...@sease.io>> wrote:
>
>     Hi, still fighting with synonyms, I have another question.
>
>     I'm not understanding the role, and the effect, of the
>     "autoGeneratePhraseQueries" attribute in a synonym context.
>     I mean, if I have the following field type:
>
>     <fieldtype name="custom_text" class="solr.TextField"
>     autoGeneratePhraseQueries="true">
>             <analyzer type="index">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>             </analyzer>
>             <analyzer type="query">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.SynonymGraphFilterFactory"
>     synonyms="synonyms.txt" ignoreCase="false" expand="true"/>
>             </analyzer>
>     </fieldtype>
>
>     with the following synonym: *out of warranty,oow*
>
>     with the following query: *q=out of warranty*
>
>     The output query is exactly what I would expect: *(title:oow
>     PhraseQuery(title:"out of warranty"))*
>
>     Setting the autoGeneratePhraseQueries to *false* (or better,
>     forgetting the attribute declaration at all), the output query is:
>
>     *(title:oow (+title:out +title:of +title:warranty))*
>
>     Which matches things like "I had to step out for renewing the
>     warranty of my device".
>
>     This, at first glance sounds to me completely wrong. Or, better,
>     I'm not able to imagine a use case where that synonym
>     decomposition could be useful. Is that wanted? I would say that
>     the query parser should always generates a phrase query for
>     multi-term synonyms, like in the first example (i.e.
>     autoGeneratePhraseQueries=true).
>
>     Thanks in advance,
>     Andrea
>


Re: Synonyms + autoGeneratePhraseQueries

Posted by Michael Sokolov <ms...@gmail.com>.
Did you mean q=oow in your example? As written, I don't see how there is a
problem.

On Thu, Jul 26, 2018 at 8:41 AM Andrea Gazzarini <a....@sease.io>
wrote:

> Hi, still fighting with synonyms, I have another question.
> I'm not understanding the role, and the effect, of the
> "autoGeneratePhraseQueries" attribute in a synonym context.
> I mean, if I have the following field type:
>
> <fieldtype name="custom_text" class="solr.TextField" autoGeneratePhraseQueries="true">
>        <analyzer type="index">
>            <tokenizer class="solr.StandardTokenizerFactory"/>
>            <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>            <tokenizer class="solr.StandardTokenizerFactory"/>
>            <filter class="solr.LowerCaseFilterFactory"/>
>            <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="false" expand="true"/>
>        </analyzer></fieldtype>
>
> with the following synonym: *out of warranty,oow*
>
> with the following query: *q=out of warranty*
>
> The output query is exactly what I would expect: *(title:oow
> PhraseQuery(title:"out of warranty"))*
>
> Setting the autoGeneratePhraseQueries to *false* (or better, forgetting
> the attribute declaration at all), the output query is:
>
> *(title:oow (+title:out +title:of +title:warranty))*
> Which matches things like "I had to step out for renewing the warranty of
> my device".
>
> This, at first glance sounds to me completely wrong. Or, better, I'm not
> able to imagine a use case where that synonym decomposition could be
> useful. Is that wanted? I would say that the query parser should always
> generates a phrase query for multi-term synonyms, like in the first example
> (i.e. autoGeneratePhraseQueries=true).
>
> Thanks in advance,
> Andrea
>