You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Omar Didi <od...@Cyveillance.com> on 2005/03/01 15:12:01 UTC

RE: help with boolean expression

I found something kind fo weird about the way lucene interprets boolean expressions wihout parenthesis.
when i run the query A AND B OR C, it returns only the documents that have A(in other words as if the query was just the term A). 
when I run the query A OR B AND C, it returns only the documents that have B AND C(as if teh query was just B AND C ). I set the default operator in my application to be AND. 
can anyone explain this behavior, thanks.

-----Original Message-----
From: Morus Walter [mailto:morus.walter@tanto.de]
Sent: Monday, February 28, 2005 2:40 AM
To: Lucene Users List
Subject: Re: help with boolean expression


Omar Didi writes:
> I have a problem understanding how would lucene iterpret this boolean expression : A AND B OR C .
> it neither return the same count as when I enter (A AND B) OR C nor A AND (B OR C). 
> if anyone knows how it is interpreted i would be thankful.
> thanks

A AND B OR C creates a query that requires A and B. C influcenes the 
score, but is neither sufficient nor required for a match.

IMO query parser is broken for queries mixing AND and OR without explicit
braces.
My favorite sample is `a AND b OR c AND d' which equals `a AND b AND c AND d'
in query parser.

I suggested a patch some time ago, but it's still pending in bugzilla.
http://issues.apache.org/bugzilla/show_bug.cgi?id=25820

Don't know if it's still usable with current sources.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: help with boolean expression

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 3, 2005, at 1:32 PM, Doug Cutting wrote:
> Daniel Naber wrote:
>> On Wednesday 02 March 2005 12:25, Erik Hatcher wrote:
>>> I agree that the current behavior is awkward.  Is it worth breaking
>>> backwards compatibility to correct this with the patch applied?
>> I'd vote for fixing this as long as the current QueryParser is still 
>> available in Lucene core, maybe as OldQueryParser or 
>> NoPrecedenceQueryParser. It's surely better to fix it for 2.0 than 
>> for 2.x.
>
> +1.  Fixing operator precedence seems to me like an acceptable 
> incompatibility.  The change needs to be well documented in release 
> notes, and the old QueryParser should be available, deprecated, for a 
> time for back-compatibility.

Ok, since I'm now deep into JavaCC for a consulting project, I will 
attempt this change at some point in the near future.  It's actually 
not too hard.  If someone wants to beat me to it feel free.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: help with boolean expression

Posted by Doug Cutting <cu...@apache.org>.
Daniel Naber wrote:
> On Wednesday 02 March 2005 12:25, Erik Hatcher wrote:
> 
>>I agree that the current behavior is awkward.  Is it worth breaking
>>backwards compatibility to correct this with the patch applied?
> 
> I'd vote for fixing this as long as the current QueryParser is still available 
> in Lucene core, maybe as OldQueryParser or NoPrecedenceQueryParser. It's 
> surely better to fix it for 2.0 than for 2.x.

+1.  Fixing operator precedence seems to me like an acceptable 
incompatibility.  The change needs to be well documented in release 
notes, and the old QueryParser should be available, deprecated, for a 
time for back-compatibility.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: help with boolean expression

Posted by Daniel Naber <da...@t-online.de>.
On Wednesday 02 March 2005 12:25, Erik Hatcher wrote:

> I agree that the current behavior is awkward.  Is it worth breaking
> backwards compatibility to correct this with the patch applied?

I'd vote for fixing this as long as the current QueryParser is still available 
in Lucene core, maybe as OldQueryParser or NoPrecedenceQueryParser. It's 
surely better to fix it for 2.0 than for 2.x.

Regards
 Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: help with boolean expression

Posted by Yonik Seeley <ys...@gmail.com>.
> I agree that the current behavior is awkward.  Is it worth breaking
> backwards compatibility to correct this with the patch applied?

IMHO, definitely.
The current behavior is so surprising that I doubt  that no one is
relying on it.

I didn't really see this behavior explicitly pointed out in "Lucene in
Action".  There is a sentence at the end of 3.4.4 that could allude to
it,  but nothing that hits you over the head.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Not storing norms and term positions info, possible?

Posted by eks dev <ek...@yahoo.co.uk>.
Hi,
Is there a way to create index which does not store
norms (.f..) and Positions (.prx files)?

In the case I need to support, no length normalisation
is needed, the same is with positional info.
(Similarity.encodeNorms(float) returns 0; and
Term.SetPositionalIncrement(0) is used)

>From the size of these files I speculate a lot of io
should have happened during indexing. 

Does that make sense?

thanks, Eks




  
 

Send instant messages to your online friends http://uk.messenger.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: help with boolean expression

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
I'm deep into implementing a custom (not generalizable, sorry) query 
parser and am evaluating this very issue now.

Lucene indeed does some funny stuff with boolean operators.  Output the 
toString of your resultant Query's to see the details, or have a look 
at the Bugzilla issue that Morus mentions below.

First some background: BooleanQuery clauses each have their own set of 
required/optional/prohibited attributes.  Putting operators between 
them is awkward in the Lucene sense because the parser has to set each 
clause individually, not in relation to another one.

QueryParser takes the most recent operator and applies it to both the 
clause before and after, and there is no sense of operator precedence 
(such as in a math expression like 1 + 2 * 3).  In the case of A AND B 
OR C, when the AND is encountered, it sets the required attribute for 
both A and B, but then when the OR is encountered it sets the optional 
attribute on B and C, stepping on the previous required flag for B.  
And similarly with A OR B AND C.

I agree that the current behavior is awkward.  Is it worth breaking 
backwards compatibility to correct this with the patch applied?

As for the default operator, it is not coming into play in your 
expression examples because there all clauses have an explicit 
conjunction that sets the flag.  The default operator comes into play 
for an expression like A B AND C and would be used to set the flag on 
the A clause.

	Erik


On Mar 1, 2005, at 9:12 AM, Omar Didi wrote:

> I found something kind fo weird about the way lucene interprets 
> boolean expressions wihout parenthesis.
> when i run the query A AND B OR C, it returns only the documents that 
> have A(in other words as if the query was just the term A).
> when I run the query A OR B AND C, it returns only the documents that 
> have B AND C(as if teh query was just B AND C ). I set the default 
> operator in my application to be AND.
> can anyone explain this behavior, thanks.
>
> -----Original Message-----
> From: Morus Walter [mailto:morus.walter@tanto.de]
> Sent: Monday, February 28, 2005 2:40 AM
> To: Lucene Users List
> Subject: Re: help with boolean expression
>
>
> Omar Didi writes:
>> I have a problem understanding how would lucene iterpret this boolean 
>> expression : A AND B OR C .
>> it neither return the same count as when I enter (A AND B) OR C nor A 
>> AND (B OR C).
>> if anyone knows how it is interpreted i would be thankful.
>> thanks
>
> A AND B OR C creates a query that requires A and B. C influcenes the
> score, but is neither sufficient nor required for a match.
>
> IMO query parser is broken for queries mixing AND and OR without 
> explicit
> braces.
> My favorite sample is `a AND b OR c AND d' which equals `a AND b AND c 
> AND d'
> in query parser.
>
> I suggested a patch some time ago, but it's still pending in bugzilla.
> http://issues.apache.org/bugzilla/show_bug.cgi?id=25820
>
> Don't know if it's still usable with current sources.
>
> Morus
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org