You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Andrea Gazzarini <a....@sease.io> on 2018/07/26 12:41:11 UTC
Synonyms + autoGeneratePhraseQueries
Hi, still fighting with synonyms, I have another question.
I'm not understanding the role, and the effect, of the
"autoGeneratePhraseQueries" attribute in a synonym context.
I mean, if I have the following field type:
<fieldtype name="custom_text" class="solr.TextField"
autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="false" expand="true"/>
</analyzer>
</fieldtype>
with the following synonym: *out of warranty,oow*
with the following query: *q=out of warranty*
The output query is exactly what I would expect: *(title:oow
PhraseQuery(title:"out of warranty"))*
Setting the autoGeneratePhraseQueries to *false* (or better, forgetting
the attribute declaration at all), the output query is:
*(title:oow (+title:out +title:of +title:warranty))*
Which matches things like "I had to step out for renewing the warranty
of my device".
This, at first glance sounds to me completely wrong. Or, better, I'm not
able to imagine a use case where that synonym decomposition could be
useful. Is that wanted? I would say that the query parser should always
generates a phrase query for multi-term synonyms, like in the first
example (i.e. autoGeneratePhraseQueries=true).
Thanks in advance,
Andrea
Re: Synonyms + autoGeneratePhraseQueries
Posted by Andrea Gazzarini <a....@sease.io>.
Hi Michael,
the synonym is expanded at query time so the resulting query is the same
regardless if q=oow or q=out of warranty:
1) autogeneratePhraseQueries=*true*, sow=false, df=title
q=out of warranty => (title:oow PhraseQuery(title:"out of warranty"))
q=oow =>(PhraseQuery(title:"out of warranty") title:oow)
2) autogeneratePhraseQueries=*false (or missing)*, sow=false, df=title
q=out of warranty => ((+title:out +title:of +title:warranty) title:oow)
q=oow =>(title:oow (+title:out +title:of +title:warranty))
As you can see clauses are inverted but the resulting queries are
equivalent.
Sorry I forgot a couple of info:
* I tried with Solr 7.1.0 and 7.4.0
* I'm using the following request handler, but things are no different
if I switch to edismax
<requestHandler name="/def" class="solr.SearchHandler" default="true">
<lst name="defaults">
<bool name="sow">false</bool>
<str name="df">title</str>
<str name="defType">lucene</str>
<bool name="debug">true</bool>
</lst>
</requestHandler>
On 27/07/18 04:32, Michael Sokolov wrote:
> Did you mean q=oow in your example? As written, I don't see how there
> is a problem.
>
> On Thu, Jul 26, 2018 at 8:41 AM Andrea Gazzarini <a.gazzarini@sease.io
> <ma...@sease.io>> wrote:
>
> Hi, still fighting with synonyms, I have another question.
>
> I'm not understanding the role, and the effect, of the
> "autoGeneratePhraseQueries" attribute in a synonym context.
> I mean, if I have the following field type:
>
> <fieldtype name="custom_text" class="solr.TextField"
> autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms.txt" ignoreCase="false" expand="true"/>
> </analyzer>
> </fieldtype>
>
> with the following synonym: *out of warranty,oow*
>
> with the following query: *q=out of warranty*
>
> The output query is exactly what I would expect: *(title:oow
> PhraseQuery(title:"out of warranty"))*
>
> Setting the autoGeneratePhraseQueries to *false* (or better,
> forgetting the attribute declaration at all), the output query is:
>
> *(title:oow (+title:out +title:of +title:warranty))*
>
> Which matches things like "I had to step out for renewing the
> warranty of my device".
>
> This, at first glance sounds to me completely wrong. Or, better,
> I'm not able to imagine a use case where that synonym
> decomposition could be useful. Is that wanted? I would say that
> the query parser should always generates a phrase query for
> multi-term synonyms, like in the first example (i.e.
> autoGeneratePhraseQueries=true).
>
> Thanks in advance,
> Andrea
>
Re: Synonyms + autoGeneratePhraseQueries
Posted by Michael Sokolov <ms...@gmail.com>.
Did you mean q=oow in your example? As written, I don't see how there is a
problem.
On Thu, Jul 26, 2018 at 8:41 AM Andrea Gazzarini <a....@sease.io>
wrote:
> Hi, still fighting with synonyms, I have another question.
> I'm not understanding the role, and the effect, of the
> "autoGeneratePhraseQueries" attribute in a synonym context.
> I mean, if I have the following field type:
>
> <fieldtype name="custom_text" class="solr.TextField" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="false" expand="true"/>
> </analyzer></fieldtype>
>
> with the following synonym: *out of warranty,oow*
>
> with the following query: *q=out of warranty*
>
> The output query is exactly what I would expect: *(title:oow
> PhraseQuery(title:"out of warranty"))*
>
> Setting the autoGeneratePhraseQueries to *false* (or better, forgetting
> the attribute declaration at all), the output query is:
>
> *(title:oow (+title:out +title:of +title:warranty))*
> Which matches things like "I had to step out for renewing the warranty of
> my device".
>
> This, at first glance sounds to me completely wrong. Or, better, I'm not
> able to imagine a use case where that synonym decomposition could be
> useful. Is that wanted? I would say that the query parser should always
> generates a phrase query for multi-term synonyms, like in the first example
> (i.e. autoGeneratePhraseQueries=true).
>
> Thanks in advance,
> Andrea
>