You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Dyer, James" <Ja...@ingrambook.com> on 2010/12/22 16:25:10 UTC

edismax inconsistency -- AND/OR

I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm experiencing inconsistent behavior with terms grouped in parenthesis.  Sometimes they are AND'ed and sometimes OR'ed together.

1. q=Title:(life)&defType=edismax  > 285 results
2. q=Title:(hope)&defType=edismax  > 34 results

3. q=Title:(life AND hope)&defType=edismax > 1 result
4. q=Title:(life OR hope)&defType=edismax > 318 results
5. q=Title:(life hope)&defType=edismax  > 1 result (life, hope are being AND'ed together)

6. q=Title:(life AND hope) AND Title:(life)&defType=edismax > 1 result
7. q=Title:(life OR hope) AND Title:(life)&defType=edismax > 285 result
8. q=Title:(life hope) AND Title:(life)&defType=edismax > 285 results (life, hope are being OR'ed together)

See how in #5, the two terms get AND'ed, but by adding the additional (nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature or a bug?  Am I likely doing something wrong?

I've tried this both with ...defaultOperator="AND"... and ...defaultOperator="OR"...  I've also tried the two settings with "q.op".  It seems as if edismax doesn't use these at all.

When using the default query parser, I get consistent AND/OR logic as expected.  That is, if the "defaultOperator" (or q.op if specified) is always consistently applied.

As a workaround, I think I can just always insert the operator (as in examples 6 & 7).  However, this is an extra burden on our clients that I'd like to avoid if at all possible.

See below for more configuration information.  Any ideas are appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


Snippets from schema.xml:

<fieldType name="textStemmed" class="solr.TextField" positionIncrementGap="100">
               <analyzer type="index">
                              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                              <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1" />
                              <filter class="solr.LowerCaseFilterFactory"/>
                              <filter class="solr.PorterStemFilterFactory"/>
               </analyzer>
               <analyzer type="query">
                              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                              <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
                              <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
                              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0" splitOnNumerics="0" stemEnglishPossessive="1" />
                              <filter class="solr.LowerCaseFilterFactory"/>
                              <filter class="solr.PorterStemFilterFactory"/>
               </analyzer>
</fieldType>

...

<field name="Title" type="textStemmed" indexed="true" stored="true" multiValued="false" omitNorms="true" omitTermFreqAndPositions="false" />

...

<solrQueryParser defaultOperator="AND"/>


RE: edismax inconsistency -- AND/OR

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Shawn,

Thank you for the reply.  The URL you gave was helpful and Smiley & Pugh even more so.  On Smiley & Pugh page 140, they indicate that mm=100% using dismax is analogous to Standard's "q.op=AND".  This is exactly what I need.

However...testing with these queries and edismax, I get different # of results:

q=Title:(life hope) AND Title:(life)&q.op=AND  (STANDARD Q.P.) - 1 result
q=Title:(life AND hope) AND Title:(life)&defType=edismax - 1 result
q=Title:(life hope) AND Title:(life)&defType=edismax&mm=100% - 285 results (ut-oh.  looks like the first 2 get OR'ed)

The dismax parser seems to behave as documented:

q=life hope life&defType=dismax&rows=0&qf=Title&mm=0% - 285 results (results are OR'ed as expected)
q=life hope life&defType=dismax&rows=0&qf=Title&mm=100% - 1 result (results are AND'ed as expected)

Unfortunately I need to be able to combine the use of "pf" with "key:value" syntax, wildcards, etc, so I need to use edismax, I think.

With a quick glance at ExtendedDismaxQParserPlugin, I'm finding...
 - MM is ignored if there are any of these operators in the query (OR NOT + -)  ... but AND is ok (line 227)
 - MM is ignored if the "parse" method did not return a "BooleanQuery" instance (line 244)
 - MM is used after all regardless of operators used in the query, so long as its a "BooleanQuery" (line 286)
 - The default MM value is "100%" if not specified in the query parameters (lines 241, 283)
Given the apparent contradiction here, my very quick analysis is surely missing something!  But if this is accurate, then the trick is to formulate the query in such a way so that "parse" returns an instance of "BooleanQuery", right?  

Any more advice anyone can give is appreciated!  For the client I'm responsible for, I'm just inserting explicit operators between all of the user's queries.  But for the client I'm not responsible for I would love to have a workaround for the other developers!  I think they'd appreciate it...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Shawn Heisey [mailto:solr@elyograg.org] 
Sent: Wednesday, December 22, 2010 4:08 PM
To: solr-user@lucene.apache.org
Subject: Re: edismax inconsistency -- AND/OR

On 12/22/2010 8:25 AM, Dyer, James wrote:
> I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm experiencing inconsistent behavior with terms grouped in parenthesis.  Sometimes they are AND'ed and sometimes OR'ed together.
>
> 1. q=Title:(life)&defType=edismax>  285 results
> 2. q=Title:(hope)&defType=edismax>  34 results
>
> 3. q=Title:(life AND hope)&defType=edismax>  1 result
> 4. q=Title:(life OR hope)&defType=edismax>  318 results
> 5. q=Title:(life hope)&defType=edismax>  1 result (life, hope are being AND'ed together)
>
> 6. q=Title:(life AND hope) AND Title:(life)&defType=edismax>  1 result
> 7. q=Title:(life OR hope) AND Title:(life)&defType=edismax>  285 result
> 8. q=Title:(life hope) AND Title:(life)&defType=edismax>  285 results (life, hope are being OR'ed together)
>
> See how in #5, the two terms get AND'ed, but by adding the additional (nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature or a bug?  Am I likely doing something wrong?

The dismax parser doesn't pay any attention to the default query 
operator.  in the absence of these values in the actual query, edismax 
likely doesn't either.  What matters is the value of the mm variable, 
also known as "minimum 'should' match."  If your mm value is 50%, which 
is a common value to see in dismax examples, I believe it would behave 
exactly like you are seeing.

This is a complex little beast.  Just a couple of weeks ago, Chris 
Hostetter said that although he wrote the code and the syntax for mm, 
the explanation for the parameter that's in the Smiley and Pugh Solr 
book (pages 138-140) is the clearest he's ever seen.

Here's some detailed documentation on it.  I can't find my copy of the 
book right now, so I don't know if this is as good as what's in it:

http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

Hopefully this is applicable to you, and not something you already 
thought of!

Shawn


Re: edismax inconsistency -- AND/OR

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/22/2010 8:25 AM, Dyer, James wrote:
> I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm experiencing inconsistent behavior with terms grouped in parenthesis.  Sometimes they are AND'ed and sometimes OR'ed together.
>
> 1. q=Title:(life)&defType=edismax>  285 results
> 2. q=Title:(hope)&defType=edismax>  34 results
>
> 3. q=Title:(life AND hope)&defType=edismax>  1 result
> 4. q=Title:(life OR hope)&defType=edismax>  318 results
> 5. q=Title:(life hope)&defType=edismax>  1 result (life, hope are being AND'ed together)
>
> 6. q=Title:(life AND hope) AND Title:(life)&defType=edismax>  1 result
> 7. q=Title:(life OR hope) AND Title:(life)&defType=edismax>  285 result
> 8. q=Title:(life hope) AND Title:(life)&defType=edismax>  285 results (life, hope are being OR'ed together)
>
> See how in #5, the two terms get AND'ed, but by adding the additional (nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature or a bug?  Am I likely doing something wrong?

The dismax parser doesn't pay any attention to the default query 
operator.  in the absence of these values in the actual query, edismax 
likely doesn't either.  What matters is the value of the mm variable, 
also known as "minimum 'should' match."  If your mm value is 50%, which 
is a common value to see in dismax examples, I believe it would behave 
exactly like you are seeing.

This is a complex little beast.  Just a couple of weeks ago, Chris 
Hostetter said that although he wrote the code and the syntax for mm, 
the explanation for the parameter that's in the Smiley and Pugh Solr 
book (pages 138-140) is the clearest he's ever seen.

Here's some detailed documentation on it.  I can't find my copy of the 
book right now, so I don't know if this is as good as what's in it:

http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

Hopefully this is applicable to you, and not something you already 
thought of!

Shawn