You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ard Schrijvers <a....@hippo.nl> on 2007/10/26 09:36:58 UTC

Search performance using BooleanQueries in BooleanQueries

Hello, 

I am seeing that a query with boolean queries in boolean queries takes
much longer than just a single boolean query when the number of hits if
fairly large. For example 

+prop1:a +prop2:b +prop3:c +prop4:d +prop5:e 

is much faster than 

(+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e)

where the second one is a result from BooleanQuery in BooleanQuery, and
all have Occur.MUST.

Is there a way to detect and rewrite the second inefficient query?
query.rewrite() does not change the query AFAICS. 

thanks for any help,

Regards Ard

-- 

Hippo
Oosteinde 11
1017WT Amsterdam
The Netherlands
Tel  +31 (0)20 5224466
-------------------------------------------------------------
a.schrijvers@hippo.nl / ard@apache.org / http://www.hippo.nl
-------------------------------------------------------------- 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Search performance using BooleanQueries in BooleanQueries

Posted by Ard Schrijvers <a....@hippo.nl>.


> On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote:
> > Hello,
> >
> > I am seeing that a query with boolean queries in boolean 
> queries takes 
> > much longer than just a single boolean query when the 
> number of hits 
> > if fairly large. For example
> >
> > +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
> >
> > is much faster than
> >
> > (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e)
> >
> > where the second one is a result from BooleanQuery in BooleanQuery, 
> > and all have Occur.MUST.
> >
> > Is there a way to detect and rewrite the second inefficient query?
> > query.rewrite() does not change the query AFAICS.
> 
> SImplifying boolean queries like this is not available in 
> Lucene, but it would have a positive effect on search 
> performance, especially when prop1:a and prop2:b have a high 
> document frequency.
> 
> You could write this yourself, for example by overriding 
> BooleanQuery.rewrite(). Take care about query weights, though.

Thanks for the pointer!

Regards Ard

> 
> Regards,
> Paul Elschot
> 
> 
> >
> > thanks for any help,
> >
> > Regards Ard
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search performance using BooleanQueries in BooleanQueries

Posted by Mike Klaas <mi...@gmail.com>.

On 6-Nov-07, at 3:02 PM, Paul Elschot wrote:

> On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote:

>> Wait--shouldn't the outer-most BooleanQuery provide most of this
>> speedup already (since it should be skipTo'ing between the nested
>> BooleanQueries and the outermost).  Is it the indirection and sub-
>> query management that is causing the performance difference, or
>> differences in skiptTo behaviour?
>
> The usual Lucene answer to performance questions: it depends.
>
> After every hit, next() needs to be called on a subquery before
> skipTo() can be used to find the next hit. It is currently not  
> defined which
> subquery will be used for this first next().
>
> The structure of the scorers normally follows the structure of
> the BooleanQueries, so the indirection over the deep subquery
> scores could well  be relevant to performance, too.
>
> Which of these factors actually dominates performance is hard
> to predict in advance. The point of skipTo() is that is tries to avoid
> disk I/O as much as possible for the first time that the query is
> executed. Later executions are much more likely to hit the OS cache,
> and then the indirections will be more relevant to performance.
>
> I'd like to have a good way to do a performance test on a first
> query execution, in the sense that it does not hit the OS cache
> for its skipTo() executions, but I have not found a good way yet.

Interesting--thanks for the thoughtful answer.

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search performance using BooleanQueries in BooleanQueries

Posted by Paul Elschot <pa...@xs4all.nl>.

On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote:
> On 29-Oct-07, at 9:43 AM, Paul Elschot wrote:
> > On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote:
> >> +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
> >>
> >> is much faster than
> >>
> >> (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e)
> >>
> >> where the second one is a result from BooleanQuery in
> >> BooleanQuery, and
> >> all have Occur.MUST.
> >
> > SImplifying boolean queries like this is not available in Lucene,
> > but it
> > would have a positive effect on search performance, especially when
> > prop1:a and prop2:b have a high document frequency.
>
> Wait--shouldn't the outer-most BooleanQuery provide most of this
> speedup already (since it should be skipTo'ing between the nested
> BooleanQueries and the outermost).  Is it the indirection and sub-
> query management that is causing the performance difference, or
> differences in skiptTo behaviour?

The usual Lucene answer to performance questions: it depends.

After every hit, next() needs to be called on a subquery before
skipTo() can be used to find the next hit. It is currently not defined which 
subquery will be used for this first next().

The structure of the scorers normally follows the structure of
the BooleanQueries, so the indirection over the deep subquery
scores could well  be relevant to performance, too.

Which of these factors actually dominates performance is hard
to predict in advance. The point of skipTo() is that is tries to avoid
disk I/O as much as possible for the first time that the query is
executed. Later executions are much more likely to hit the OS cache,
and then the indirections will be more relevant to performance.

I'd like to have a good way to do a performance test on a first
query execution, in the sense that it does not hit the OS cache
for its skipTo() executions, but I have not found a good way yet.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search performance using BooleanQueries in BooleanQueries

Posted by Mike Klaas <mi...@gmail.com>.

On 29-Oct-07, at 9:43 AM, Paul Elschot wrote:

> On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote:
>> +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
>>
>> is much faster than
>>
>> (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e)
>>
>> where the second one is a result from BooleanQuery in  
>> BooleanQuery, and
>> all have Occur.MUST.
>>
>
> SImplifying boolean queries like this is not available in Lucene,  
> but it
> would have a positive effect on search performance, especially when
> prop1:a and prop2:b have a high document frequency.

Wait--shouldn't the outer-most BooleanQuery provide most of this  
speedup already (since it should be skipTo'ing between the nested  
BooleanQueries and the outermost).  Is it the indirection and sub- 
query management that is causing the performance difference, or  
differences in skiptTo behaviour?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search performance using BooleanQueries in BooleanQueries

Posted by Paul Elschot <pa...@xs4all.nl>.

On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote:
> Hello,
>
> I am seeing that a query with boolean queries in boolean queries takes
> much longer than just a single boolean query when the number of hits if
> fairly large. For example
>
> +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
>
> is much faster than
>
> (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e)
>
> where the second one is a result from BooleanQuery in BooleanQuery, and
> all have Occur.MUST.
>
> Is there a way to detect and rewrite the second inefficient query?
> query.rewrite() does not change the query AFAICS.

SImplifying boolean queries like this is not available in Lucene, but it
would have a positive effect on search performance, especially when
prop1:a and prop2:b have a high document frequency.

You could write this yourself, for example by overriding 
BooleanQuery.rewrite(). Take care about query weights, though.

Regards,
Paul Elschot


>
> thanks for any help,
>
> Regards Ard



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org