You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by amitesh116 <am...@gmail.com> on 2012/08/22 14:13:54 UTC

Edismax parser weird behavior

Hi I am experiencing 2 strange behavior in edismax:
edismax is configured to behave default OR (using mm=0) 
Total there are 700 results
1. Search for *auto* = *50 results*
   Search for *NOT auto* it gives *651 results*. 
Mathematically, it should give only 650 results for *NOT auto*.

2. Search for *auto*  = 50 results
 Search for *car =  100 results*
Search for *auto and car = 10 results*
Since we have set mm=0, it should behave like OR and results for auto and
car would be more than 100 at least

Please help me, understand these two issues. Are these normal behavior? Do I
need to tweak the query? Or do I need to look into config or scheam xml
files.

Thanks in Advance



--
View this message in context: http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edismax parser weird behavior

Posted by Jack Krupansky <ja...@basetechnology.com>.
Don't have an immediate answer for you on #1, but for #2, "mm" does not 
override explicit operators - "and" - it only applies to terms that are not 
the immediate operand of an explicit operator. Note that by default 
lower-case operators are enabled in edismax - "and" is treated as "AND" - 
you can set "lowercaseOperators=false" to avoid that.

-- Jack Krupansky

-----Original Message----- 
From: amitesh116
Sent: Wednesday, August 22, 2012 8:13 AM
To: solr-user@lucene.apache.org
Subject: Edismax parser weird behavior

Hi I am experiencing 2 strange behavior in edismax:
edismax is configured to behave default OR (using mm=0)
Total there are 700 results
1. Search for *auto* = *50 results*
   Search for *NOT auto* it gives *651 results*.
Mathematically, it should give only 650 results for *NOT auto*.

2. Search for *auto*  = 50 results
Search for *car =  100 results*
Search for *auto and car = 10 results*
Since we have set mm=0, it should behave like OR and results for auto and
car would be more than 100 at least

Please help me, understand these two issues. Are these normal behavior? Do I
need to tweak the query? Or do I need to look into config or scheam xml
files.

Thanks in Advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Edismax parser weird behavior

Posted by Erick Erickson <er...@gmail.com>.
Right, examine the parsed query carefully and
you'll see that the semantics are much different.
At a guess, "auto" appears in more than one
field. So your first query is saying
"return me any documents for which "auto" appears
in any of the fields (text, media_transcript.....).

Your NOT clause is saying "return me the documents
for which "auto" does NOT appear in any of the fields
(text, media_transcript....).

You can't expect these to add up perfectly. Try using
something other than edismax or explicitly reference
the field as
q=text:auto
or
q=-text:auto

Best
Erick
On Mon, Aug 27, 2012 at 7:49 AM, amitesh116 <am...@gmail.com> wrote:
> Hey Erick, Thanks to Jack I have a working solution to issue no 2. For 1 I'm
> looking for a solution.
> Please help me for that.
> debug Query are as below:
> Search for *auto*: 36 results:  parsedquery:
> "+DisjunctionMaxQuery((text:auto^0.2 | media_transcript:auto^0.4 |
> title:auto^0.9 | keywords:auto^0.7 | description:auto^0.5))"
> Search for *NOT auto*: 665 results (it should be 664 as total records =
> 700):  parsedquery: "+(-DisjunctionMaxQuery((text:auto^0.2 |
> media_transcript:auto^0.4 | title:auto^0.9 | keywords:auto^0.7 |
> description:auto^0.5)) +MatchAllDocsQuery(*:*))"
> Regards,
> Amitesh
>
> Erick Erickson wrote
>>
>> What do you get when you specify &debugQuery=on (&debug=query in 4.x)?
>> In other words, what does the parsed query look like?
>>
>> Best
>> Erick
>>
>> On Wed, Aug 22, 2012 at 8:13 AM, amitesh116 &lt;amitesh116@&gt; wrote:
>>> Hi I am experiencing 2 strange behavior in edismax:
>>> edismax is configured to behave default OR (using mm=0)
>>> Total there are 700 results
>>> 1. Search for *auto* = *50 results*
>>>    Search for *NOT auto* it gives *651 results*.
>>> Mathematically, it should give only 650 results for *NOT auto*.
>>>
>>> 2. Search for *auto*  = 50 results
>>>  Search for *car =  100 results*
>>> Search for *auto and car = 10 results*
>>> Since we have set mm=0, it should behave like OR and results for auto and
>>> car would be more than 100 at least
>>>
>>> Please help me, understand these two issues. Are these normal behavior?
>>> Do I
>>> need to tweak the query? Or do I need to look into config or scheam xml
>>> files.
>>>
>>> Thanks in Advance
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626p4003446.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edismax parser weird behavior

Posted by Jack Krupansky <ja...@basetechnology.com>.
Try removing fields one at a time from "qf" and see how far you have to go 
to get the numbers to add up. The offending field will offer a clue. 
Meanwhile, what are the details of the field types for these fields?

Also, look at the id field for all 36 docs from the first query and add that 
id check to your second query to see which of the 36 is not found in the 
second results. There will be something odd/distinct about that document.

-- Jack Krupansky

-----Original Message----- 
From: amitesh116
Sent: Monday, August 27, 2012 7:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Edismax parser weird behavior

Hey Erick, Thanks to Jack I have a working solution to issue no 2. For 1 I'm
looking for a solution.
Please help me for that.
debug Query are as below:
Search for *auto*: 36 results:  parsedquery:
"+DisjunctionMaxQuery((text:auto^0.2 | media_transcript:auto^0.4 |
title:auto^0.9 | keywords:auto^0.7 | description:auto^0.5))"
Search for *NOT auto*: 665 results (it should be 664 as total records =
700):  parsedquery: "+(-DisjunctionMaxQuery((text:auto^0.2 |
media_transcript:auto^0.4 | title:auto^0.9 | keywords:auto^0.7 |
description:auto^0.5)) +MatchAllDocsQuery(*:*))"
Regards,
Amitesh

Erick Erickson wrote
>
> What do you get when you specify &debugQuery=on (&debug=query in 4.x)?
> In other words, what does the parsed query look like?
>
> Best
> Erick
>
> On Wed, Aug 22, 2012 at 8:13 AM, amitesh116 &lt;amitesh116@&gt; wrote:
>> Hi I am experiencing 2 strange behavior in edismax:
>> edismax is configured to behave default OR (using mm=0)
>> Total there are 700 results
>> 1. Search for *auto* = *50 results*
>>    Search for *NOT auto* it gives *651 results*.
>> Mathematically, it should give only 650 results for *NOT auto*.
>>
>> 2. Search for *auto*  = 50 results
>>  Search for *car =  100 results*
>> Search for *auto and car = 10 results*
>> Since we have set mm=0, it should behave like OR and results for auto and
>> car would be more than 100 at least
>>
>> Please help me, understand these two issues. Are these normal behavior?
>> Do I
>> need to tweak the query? Or do I need to look into config or scheam xml
>> files.
>>
>> Thanks in Advance
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626p4003446.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: Edismax parser weird behavior

Posted by amitesh116 <am...@gmail.com>.
Hey Erick, Thanks to Jack I have a working solution to issue no 2. For 1 I'm
looking for a solution.
Please help me for that.
debug Query are as below:
Search for *auto*: 36 results:  parsedquery:
"+DisjunctionMaxQuery((text:auto^0.2 | media_transcript:auto^0.4 |
title:auto^0.9 | keywords:auto^0.7 | description:auto^0.5))"
Search for *NOT auto*: 665 results (it should be 664 as total records =
700):  parsedquery: "+(-DisjunctionMaxQuery((text:auto^0.2 |
media_transcript:auto^0.4 | title:auto^0.9 | keywords:auto^0.7 |
description:auto^0.5)) +MatchAllDocsQuery(*:*))"
Regards,
Amitesh

Erick Erickson wrote
> 
> What do you get when you specify &debugQuery=on (&debug=query in 4.x)?
> In other words, what does the parsed query look like?
> 
> Best
> Erick
> 
> On Wed, Aug 22, 2012 at 8:13 AM, amitesh116 &lt;amitesh116@&gt; wrote:
>> Hi I am experiencing 2 strange behavior in edismax:
>> edismax is configured to behave default OR (using mm=0)
>> Total there are 700 results
>> 1. Search for *auto* = *50 results*
>>    Search for *NOT auto* it gives *651 results*.
>> Mathematically, it should give only 650 results for *NOT auto*.
>>
>> 2. Search for *auto*  = 50 results
>>  Search for *car =  100 results*
>> Search for *auto and car = 10 results*
>> Since we have set mm=0, it should behave like OR and results for auto and
>> car would be more than 100 at least
>>
>> Please help me, understand these two issues. Are these normal behavior?
>> Do I
>> need to tweak the query? Or do I need to look into config or scheam xml
>> files.
>>
>> Thanks in Advance
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 



--
View this message in context: http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626p4003446.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Edismax parser weird behavior

Posted by Erick Erickson <er...@gmail.com>.
What do you get when you specify &debugQuery=on (&debug=query in 4.x)?
In other words, what does the parsed query look like?

Best
Erick

On Wed, Aug 22, 2012 at 8:13 AM, amitesh116 <am...@gmail.com> wrote:
> Hi I am experiencing 2 strange behavior in edismax:
> edismax is configured to behave default OR (using mm=0)
> Total there are 700 results
> 1. Search for *auto* = *50 results*
>    Search for *NOT auto* it gives *651 results*.
> Mathematically, it should give only 650 results for *NOT auto*.
>
> 2. Search for *auto*  = 50 results
>  Search for *car =  100 results*
> Search for *auto and car = 10 results*
> Since we have set mm=0, it should behave like OR and results for auto and
> car would be more than 100 at least
>
> Please help me, understand these two issues. Are these normal behavior? Do I
> need to tweak the query? Or do I need to look into config or scheam xml
> files.
>
> Thanks in Advance
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html
> Sent from the Solr - User mailing list archive at Nabble.com.