You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by eks dev <ek...@yahoo.co.uk> on 2009/07/12 18:53:46 UTC

speed of BooleanQueries on 2.9

Is it possible that the same BooleanQuery on 2.9 runs significantly slower than on 2.4?

we have some strange effects where the following query runs approx 4(ouch!) times slower on 2.9, test done by 1000 times executing the same Query... But! if I run test from some real Query log with mixed Queries, I get almost the same results (?!), even slightly faster on 2.9 !?


Query: 
+((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002 NAME:hamz^0.25392 NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998 NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998 NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998 NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001 NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001 NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001 NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001 NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001 NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001 NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998 NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843 NAME:hons^0.24843 NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0) 
+(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002 ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2)

The question is just to get some hints where I should look... 

Both fealds are without norms, omitTf(true) , RAMDirectory, using 
TopDocs top = ixSearcher.search(q, null, getMaxNumOfCandidates());
and BooleanQuery.setAllowDocsOutOfOrder(true);

maybe we made some mistakes on measuring, but we did simple timing here on search() method... strange. I would bet it is something we did, but I cannot see where ...



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: speed of BooleanQueries on 2.9

Posted by eks dev <ek...@yahoo.co.uk>.
Hi Mike, 

getMaxNumOfCandidates() in test was 200, Index is optimised and read-only 

We found (due to an error in our warm-up code, funny) that only this Query runs slower on 2.9. 

A hint where to look could be that this Query cointains two, the most frequent tokens in two particular fields 
NAME:hans and ZIPS:berlin (index has ca 80Mio very short documents, 3Mio unique terms)

But all of this *could be just wrong measurement*, I just could not spend more time to get to the bottom of this. We moved forward as we got overall better average performance (sweet 10% in average) on much bigger real query log from our regression test.

Anyhow I just wanted to throw it out, maybe it triggers some synapses :) If false alarm, sorry. 





----- Original Message ----
> From: Michael McCandless <lu...@mikemccandless.com>
> To: java-user@lucene.apache.org
> Sent: Monday, 13 July, 2009 11:50:48
> Subject: Re: speed of BooleanQueries on 2.9
> 
> This is not expected; 2.9 has had a number of changes that ought to
> reduce CPU cost of searching.  If this holds up we definitely need to
> get to the root cause.
> 
> Did your test exclude the warmup query for both 2.4.1 & 2.9?  How many
> segments in the index?  What is the actual value of
> getMaxNumOfCandidates()?  If you simplify the query down (eg just do
> the NAME clause or the ZIPSS clause, alone) are those also 4X slower?
> 
> Mike
> 
> On Sun, Jul 12, 2009 at 12:53 PM, eks devwrote:
> >
> > Is it possible that the same BooleanQuery on 2.9 runs significantly slower 
> than on 2.4?
> >
> > we have some strange effects where the following query runs approx 4(ouch!) 
> times slower on 2.9, test done by 1000 times executing the same Query... But! if 
> I run test from some real Query log with mixed Queries, I get almost the same 
> results (?!), even slightly faster on 2.9 !?
> >
> >
> > Query:
> > +((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002 NAME:hamz^0.25392 
> NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998 
> NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998 
> NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998 
> NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001 
> NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001 
> NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001 
> NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001 
> NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001 
> NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001 
> NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998 
> NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843 NAME:hons^0.24843 
> NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0)
> > +(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002 
> ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2)
> >
> > The question is just to get some hints where I should look...
> >
> > Both fealds are without norms, omitTf(true) , RAMDirectory, using
> > TopDocs top = ixSearcher.search(q, null, getMaxNumOfCandidates());
> > and BooleanQuery.setAllowDocsOutOfOrder(true);
> >
> > maybe we made some mistakes on measuring, but we did simple timing here on 
> search() method... strange. I would bet it is something we did, but I cannot see 
> where ...
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: speed of BooleanQueries on 2.9

Posted by Michael McCandless <lu...@mikemccandless.com>.
This is not expected; 2.9 has had a number of changes that ought to
reduce CPU cost of searching.  If this holds up we definitely need to
get to the root cause.

Did your test exclude the warmup query for both 2.4.1 & 2.9?  How many
segments in the index?  What is the actual value of
getMaxNumOfCandidates()?  If you simplify the query down (eg just do
the NAME clause or the ZIPSS clause, alone) are those also 4X slower?

Mike

On Sun, Jul 12, 2009 at 12:53 PM, eks dev<ek...@yahoo.co.uk> wrote:
>
> Is it possible that the same BooleanQuery on 2.9 runs significantly slower than on 2.4?
>
> we have some strange effects where the following query runs approx 4(ouch!) times slower on 2.9, test done by 1000 times executing the same Query... But! if I run test from some real Query log with mixed Queries, I get almost the same results (?!), even slightly faster on 2.9 !?
>
>
> Query:
> +((NAME:hans NAME:hahns^0.23232001 NAME:hams^0.27648002 NAME:hamz^0.25392 NAME:hanas^0.18722998 NAME:hanbs^0.18722998 NAME:hanfs^0.18722998 NAME:hangs^0.18722998 NAME:hanhs^0.24030754 NAME:hanis^0.18722998 NAME:hanjs^0.18722998 NAME:hanks^0.18722998 NAME:hanms^0.18722998 NAME:hanos^0.18722998 NAME:hanrs^0.18722998 NAME:hansb^0.20172001 NAME:hansd^0.20172001 NAME:hansf^0.20172001 NAME:hansg^0.20172001 NAME:hansi^0.20172001 NAME:hansj^0.20172001 NAME:hansk^0.20172001 NAME:hansl^0.20172001 NAME:hansn^0.20172001 NAME:hanso^0.20172001 NAME:hansp^0.20172001 NAME:hanst^0.20172001 NAME:hansu^0.20172001 NAME:hansw^0.20172001 NAME:hansy^0.20172001 NAME:hansz^0.20172001 NAME:hants^0.18722998 NAME:hanus^0.18722998 NAME:hanws^0.18722998 NAME:hehns^0.20172001 NAME:hens^0.2736075 NAME:hins^0.24843 NAME:hons^0.24843 NAME:huhns^0.1801875 NAME:huns^0.24843)^2.0)
> +(((ZIPS:berlin ZIPS:barlin^0.28227 ZIPS:berien^0.25947002 ZIPS:berling^0.23232001 ZIPS:perlin^0.26133335))^1.2)
>
> The question is just to get some hints where I should look...
>
> Both fealds are without norms, omitTf(true) , RAMDirectory, using
> TopDocs top = ixSearcher.search(q, null, getMaxNumOfCandidates());
> and BooleanQuery.setAllowDocsOutOfOrder(true);
>
> maybe we made some mistakes on measuring, but we did simple timing here on search() method... strange. I would bet it is something we did, but I cannot see where ...
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org