You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nicky Mastin <Ni...@vin.com> on 2018/10/25 22:24:13 UTC

Edismax query returning the same number of results using AND as it does with OR

Oddity with edismax and queries involving boolean operators.  Here's the 
"parsedquery_toString" from two different queries:
input:  "dog AND kiwi":
https://apaste.info/gaQl
input:  "dog OR kiwi":
https://apaste.info/sBwa
Both queries return the same number of results (389).  The query with OR was 
expected to have a much higher numFound.  Those pastes have a one week 
lifetime.
The two parsed queries are almost identical.  The AND query has a couple of 
extra plus signs compared to the OR query, and the OR query has a ~2 after a 
right paren that the AND query doesn't have.  I'm at a loss as to what this 
all means, except to say that it didn't turn out as expected.
Should the two queries have returned different numbers of results?  If not, 
why is that the case?
Here is the output from echoParams=all on the OR query:
<str name="spellcheck.collateExtendedResults">true</str>
<str name="df">text</str>
<str name="hl">true</str>
<str name="hl.bs.type">LINE</str>
<str name="f.sourceid.facet.method">enum</str>
<str name="spellcheck.maxCollations">3</str>
<str name="tie"> 0.4</str>
<str name="spellcheck.maxResultsForSuggest">5</str>
<str name="qf">
title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50 
keywords_bm25_no_norms^50 authors text description species
</str>
<arr name="f.year.facet.range.other">
<str>before</str>
<str>after</str>
</arr>
<str name="hl.fl">subdocuments,keywords,authors</str>
<str name="mm">3<-1 6<-3 9<30%</str>
<str name="f.year.facet.range.hardend">true</str>
<str name="hl.formatter">html</str>
<str name="spellcheck">on</str>
<arr name="boost">
<str>
max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)
</str>
<str>rank</str>
</arr>
<str name="debugQuery">true</str>
<str name="f.sourceid.facet.limit">1000</str>
<str name="hl.boundaryScanner">breakIterator</str>
<str name="spellcheck.collate">true</str>
<str name="facet.range">year</str>
<str name="f.year.facet.range.end">2015</str>
<str name="spellcheck.dictionary">spell_file</str>
<str name="indent">true</str>
<str name="echoParams">all</str>
<str name="fl">
id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score
</str>
<str name="hl.requireFieldMatch">false</str>
<str name="hl.fragsize">100</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="f.year.facet.range.gap">5</str>
<str name="hl.simple.pre"><strong></str>
<arr name="facet.query">
<str>
{!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]
</str>
<str>
{!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]
</str>
<str>
{!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]
</str>
<str>
{!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]
</str>
</arr>
<str name="defType">edismax</str>
<str name="hl.mergeContiguous">false</str>
<str name="f.folderid.facet.method">enum</str>
<str name="wt">xml</str>
<str name="hl.highlightMultiTerm">true</str>
<str name="q.alt">*:*</str>
<arr name="facet.field">
<str>folderid</str>
<str>sourceid</str>
<str>speciesid</str>
<str>admin</str>
</arr>
<str name="f.speciesid.facet.method">enum</str>
<str name="json.nl">map</str>
<str name="start">0</str>
<str name="hl.usePhraseHightligher">true</str>
<str name="rows">25</str>
<str name="spellcheck.alternativeTermCount">2</str>
<str name="spellcheck.extendedResults">true</str>
<str name="q">dog OR kiwi</str>
<str name="f.year.facet.range.start">1970</str>
<str name="hl.simple.post"></strong></str>
<str name="pf">
title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000 
keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000 
text~100^500 description^1
</str>
<str name="facet.mincount">1</str>
<str name="hl.method">unified</str>
<str name="spellcheck.count">10</str>
<str name="pf3">
title~22^1000 keywordsranked_bm25_no_norms~22^1000 
keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100
</str>
<str name="pf2">authors~11 species~11</str>
<str name="facet">on</str>
If anyone has any ideas about whether this behavior is expected or 
unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch for 
SOLR-12243 applied.
There might be information that would be helpful that isn't provided.  If 
there is something else needed, please let me know, so I can provide it.


Re: Edismax query returning the same number of results using AND as it does with OR

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
Hi,

What is your full query path or URL that you pass for the query?
And how is your setting like for the edismax in your solrconfig.xml?

Regards,
Edwin

On Fri, 26 Oct 2018 at 06:24, Nicky Mastin <Ni...@vin.com> wrote:

>
> Oddity with edismax and queries involving boolean operators.  Here's the
> "parsedquery_toString" from two different queries:
> input:  "dog AND kiwi":
> https://apaste.info/gaQl
> input:  "dog OR kiwi":
> https://apaste.info/sBwa
> Both queries return the same number of results (389).  The query with OR
> was
> expected to have a much higher numFound.  Those pastes have a one week
> lifetime.
> The two parsed queries are almost identical.  The AND query has a couple
> of
> extra plus signs compared to the OR query, and the OR query has a ~2 after
> a
> right paren that the AND query doesn't have.  I'm at a loss as to what
> this
> all means, except to say that it didn't turn out as expected.
> Should the two queries have returned different numbers of results?  If
> not,
> why is that the case?
> Here is the output from echoParams=all on the OR query:
> <str name="spellcheck.collateExtendedResults">true</str>
> <str name="df">text</str>
> <str name="hl">true</str>
> <str name="hl.bs.type">LINE</str>
> <str name="f.sourceid.facet.method">enum</str>
> <str name="spellcheck.maxCollations">3</str>
> <str name="tie"> 0.4</str>
> <str name="spellcheck.maxResultsForSuggest">5</str>
> <str name="qf">
> title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50
> keywords_bm25_no_norms^50 authors text description species
> </str>
> <arr name="f.year.facet.range.other">
> <str>before</str>
> <str>after</str>
> </arr>
> <str name="hl.fl">subdocuments,keywords,authors</str>
> <str name="mm">3<-1 6<-3 9<30%</str>
> <str name="f.year.facet.range.hardend">true</str>
> <str name="hl.formatter">html</str>
> <str name="spellcheck">on</str>
> <arr name="boost">
> <str>
> max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)
> </str>
> <str>rank</str>
> </arr>
> <str name="debugQuery">true</str>
> <str name="f.sourceid.facet.limit">1000</str>
> <str name="hl.boundaryScanner">breakIterator</str>
> <str name="spellcheck.collate">true</str>
> <str name="facet.range">year</str>
> <str name="f.year.facet.range.end">2015</str>
> <str name="spellcheck.dictionary">spell_file</str>
> <str name="indent">true</str>
> <str name="echoParams">all</str>
> <str name="fl">
>
> id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score
> </str>
> <str name="hl.requireFieldMatch">false</str>
> <str name="hl.fragsize">100</str>
> <str name="spellcheck.maxCollationTries">5</str>
> <str name="f.year.facet.range.gap">5</str>
> <str name="hl.simple.pre"><strong></str>
> <arr name="facet.query">
> <str>
> {!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]
> </str>
> <str>
> {!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]
> </str>
> <str>
> {!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]
> </str>
> <str>
> {!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]
> </str>
> </arr>
> <str name="defType">edismax</str>
> <str name="hl.mergeContiguous">false</str>
> <str name="f.folderid.facet.method">enum</str>
> <str name="wt">xml</str>
> <str name="hl.highlightMultiTerm">true</str>
> <str name="q.alt">*:*</str>
> <arr name="facet.field">
> <str>folderid</str>
> <str>sourceid</str>
> <str>speciesid</str>
> <str>admin</str>
> </arr>
> <str name="f.speciesid.facet.method">enum</str>
> <str name="json.nl">map</str>
> <str name="start">0</str>
> <str name="hl.usePhraseHightligher">true</str>
> <str name="rows">25</str>
> <str name="spellcheck.alternativeTermCount">2</str>
> <str name="spellcheck.extendedResults">true</str>
> <str name="q">dog OR kiwi</str>
> <str name="f.year.facet.range.start">1970</str>
> <str name="hl.simple.post"></strong></str>
> <str name="pf">
> title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000
> keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000
> text~100^500 description^1
> </str>
> <str name="facet.mincount">1</str>
> <str name="hl.method">unified</str>
> <str name="spellcheck.count">10</str>
> <str name="pf3">
> title~22^1000 keywordsranked_bm25_no_norms~22^1000
> keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100
> </str>
> <str name="pf2">authors~11 species~11</str>
> <str name="facet">on</str>
> If anyone has any ideas about whether this behavior is expected or
> unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch
> for
> SOLR-12243 applied.
> There might be information that would be helpful that isn't provided.  If
> there is something else needed, please let me know, so I can provide it.
>
>

Re: Edismax query returning the same number of results using AND as it does with OR

Posted by Shawn Heisey <ap...@elyograg.org>.
Followup:

I had a theory that Nicky tested, and I think what was observed confirms the theory.

TL;DR:

In previous versions, I think there was a bug where the presence of boolean operators caused edismax to ignore the mm parameter, and only rely on the boolean operator(s).

After that bug got fixed, mm will apply to any SHOULD clauses in the query. A query of "a OR b" has two SHOULD clauses, and the mm value present in this query requires all clauses to match, so it is effectively the same as "a AND b".

A potential workaround that appears to work: Detect when the query contains a boolean operator, and in that situation, send mm=0 with the query. Alternately, just do that when the query contains "OR" - things work right with AND & NOT because these don't produce SHOULD clauses.

Thanks,
Shawn



⁣Sent from TypeApp ​

On Oct 25, 2018, 15:24, at 15:24, Nicky Mastin <ni...@vin.com> wrote:
>
>Oddity with edismax and queries involving boolean operators.  Here's
>the 
>"parsedquery_toString" from two different queries:
>input:  "dog AND kiwi":
>https://apaste.info/gaQl
>input:  "dog OR kiwi":
>https://apaste.info/sBwa
>Both queries return the same number of results (389).  The query with
>OR was 
>expected to have a much higher numFound.  Those pastes have a one week 
>lifetime.
>The two parsed queries are almost identical.  The AND query has a
>couple of 
>extra plus signs compared to the OR query, and the OR query has a ~2
>after a 
>right paren that the AND query doesn't have.  I'm at a loss as to what
>this 
>all means, except to say that it didn't turn out as expected.
>Should the two queries have returned different numbers of results?  If
>not, 
>why is that the case?
>Here is the output from echoParams=all on the OR query:
><str name="spellcheck.collateExtendedResults">true</str>
><str name="df">text</str>
><str name="hl">true</str>
><str name="hl.bs.type">LINE</str>
><str name="f.sourceid.facet.method">enum</str>
><str name="spellcheck.maxCollations">3</str>
><str name="tie"> 0.4</str>
><str name="spellcheck.maxResultsForSuggest">5</str>
><str name="qf">
>title^100 kw1ranked^100 kw1^100 keywordsranked_bm25_no_norms^50 
>keywords_bm25_no_norms^50 authors text description species
></str>
><arr name="f.year.facet.range.other">
><str>before</str>
><str>after</str>
></arr>
><str name="hl.fl">subdocuments,keywords,authors</str>
><str name="mm">3<-1 6<-3 9<30%</str>
><str name="f.year.facet.range.hardend">true</str>
><str name="hl.formatter">html</str>
><str name="spellcheck">on</str>
><arr name="boost">
><str>
>max(recip(ms(NOW/DAY+1YEAR,dateint),3.16E-11,10,6),documentdatefix)
></str>
><str>rank</str>
></arr>
><str name="debugQuery">true</str>
><str name="f.sourceid.facet.limit">1000</str>
><str name="hl.boundaryScanner">breakIterator</str>
><str name="spellcheck.collate">true</str>
><str name="facet.range">year</str>
><str name="f.year.facet.range.end">2015</str>
><str name="spellcheck.dictionary">spell_file</str>
><str name="indent">true</str>
><str name="echoParams">all</str>
><str name="fl">
>id,title,description,url,objecttypeid,contexturl,defaultsourceid,sourceid,score
></str>
><str name="hl.requireFieldMatch">false</str>
><str name="hl.fragsize">100</str>
><str name="spellcheck.maxCollationTries">5</str>
><str name="f.year.facet.range.gap">5</str>
><str name="hl.simple.pre"><strong></str>
><arr name="facet.query">
><str>
>{!ex=dt key="Last10yr"}dateint:[NOW/YEAR-10YEARS TO *]
></str>
><str>
>{!ex=dt key="Last5yr"}dateint:[NOW/YEAR-5YEARS TO *]
></str>
><str>
>{!ex=dt key="Last3yr"}dateint:[NOW/YEAR-3YEARS TO *]
></str>
><str>
>{!ex=dt key="Last1yr"}dateint:[NOW/YEAR-1YEAR TO *]
></str>
></arr>
><str name="defType">edismax</str>
><str name="hl.mergeContiguous">false</str>
><str name="f.folderid.facet.method">enum</str>
><str name="wt">xml</str>
><str name="hl.highlightMultiTerm">true</str>
><str name="q.alt">*:*</str>
><arr name="facet.field">
><str>folderid</str>
><str>sourceid</str>
><str>speciesid</str>
><str>admin</str>
></arr>
><str name="f.speciesid.facet.method">enum</str>
><str name="json.nl">map</str>
><str name="start">0</str>
><str name="hl.usePhraseHightligher">true</str>
><str name="rows">25</str>
><str name="spellcheck.alternativeTermCount">2</str>
><str name="spellcheck.extendedResults">true</str>
><str name="q">dog OR kiwi</str>
><str name="f.year.facet.range.start">1970</str>
><str name="hl.simple.post"></strong></str>
><str name="pf">
>title~20^5000 keywordsranked_bm25_no_norms~20^5000 kw1ranked~10^5000 
>keywords_bm25_no_norms~20^1500 kw1~10^500 authors^250 text~20^1000 
>text~100^500 description^1
></str>
><str name="facet.mincount">1</str>
><str name="hl.method">unified</str>
><str name="spellcheck.count">10</str>
><str name="pf3">
>title~22^1000 keywordsranked_bm25_no_norms~22^1000 
>keywords_bm25_no_norms~12^500 kw1ranked~12^100 kw1~12^100 text~22^100
></str>
><str name="pf2">authors~11 species~11</str>
><str name="facet">on</str>
>If anyone has any ideas about whether this behavior is expected or 
>unexpected, I'd appreciate hearing them.  It is Solr 7.1.0 with a patch
>for 
>SOLR-12243 applied.
>There might be information that would be helpful that isn't provided. 
>If 
>there is something else needed, please let me know, so I can provide
>it.