You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2009/07/17 20:55:54 UTC

constant-score rewrite mode for NumericRangeQuery

Should we really default to constant-score rewrite with NumericRangeQuery?

Would BooleanQuery rewrite mode give better performance on a large
index, since the number of terms should be smallish w/ the default
precisionStep (4), I think?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: constant-score rewrite mode for NumericRangeQuery

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Sat, Jul 18, 2009 at 6:54 AM, Uwe Schindler<uw...@thetaphi.de> wrote:

> I did some perf tests with the well-known PerfTest.java from the
> FieldCacheRangeFilter JIRA issue.
>
> I compared a 5 mio doc index with precStep=4:
>
> With constant score rewrite:
> avg number of terms: 68.3
> TRIE: best time=6.192687 ms; worst time=463.0907 ms; avg=222.64312909999998
> ms; sum=31994466
>
> With boolean rewrite:
> avg number of terms: 68.3
> TRIE: best time=12.674237 ms; worst time=583.702957 ms; avg=257.912947 ms;
> sum=31994466
>
> Both numbers were taken after some warming up queries, the rand seed was
> identical (so exactly same queries). It looks for this index size still
> faster than Boolean rewrite.

OK these are good results; thanks for running them!

> Especially the warmin queries take much longer
> with Boolean rewrite. The problem with my test here is, that the whole index
> seems to be in OS cache. If it is not in OS cache, I think the much longer
> time, the first Boolean queries took, will get more important.

Agreed.

> In my opinion, we should keep constant score enabled.

OK +1

> My main problem with
> Boolean rewrite is the completely useless scoring. A range query should
> always have constant score. We could maybe fix this some time in future,
> that you can disable scorers for Boolean queries (e.g.
> bq.setDoConstantScore(true)). I think this is part of this special issue in
> JIRA (do not know the number yet).

I completely agree; we need to make it possible to do BooleanQuery
expansion method with constant scoring (I opened an issue for this
already -- LUCENE-1644).

> A second problem with Boolean rewrite: with precStep=4, it is guaranteed,
> that the query will not hit the 1024 max clause problem (see formula with
> the theoretical max term number) - so no problem at all.

Right.

> The problem starts,
> if you combine 2 or three numeric queries combined by
> BooleanClaus.Occur.MUST in a top-level Boolean query (the typical example of
> a geo query). In this case, the Boolean queries that only consist of MUST
> may be combined into one big one (correct me if I am wrong) and then the max
> clause count gets a problem.

Actually Lucene never does structural optimizations of BooleanQuery,
and I think it should (though scores would be different).

One exception: if the BooleanQuery has a single clause, it'll rewrite
itself to the rewrite of that one sub-query.

> If we change the default, keep in mind to reopen SOLR-940, as it assumes to
> have constant score mode per default and solr's default precStep is 8 ->
> *bang*. Maybe the solr people should fix this and still explicitely set the
> mode for all range queries.

Let's not change the default :)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


RE: constant-score rewrite mode for NumericRangeQuery

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi Mike,

I did some perf tests with the well-known PerfTest.java from the
FieldCacheRangeFilter JIRA issue.

I compared a 5 mio doc index with precStep=4:

With constant score rewrite: 
avg number of terms: 68.3
TRIE: best time=6.192687 ms; worst time=463.0907 ms; avg=222.64312909999998
ms; sum=31994466

With boolean rewrite:
avg number of terms: 68.3
TRIE: best time=12.674237 ms; worst time=583.702957 ms; avg=257.912947 ms;
sum=31994466

Both numbers were taken after some warming up queries, the rand seed was
identical (so exactly same queries). It looks for this index size still
faster than Boolean rewrite. Especially the warmin queries take much longer
with Boolean rewrite. The problem with my test here is, that the whole index
seems to be in OS cache. If it is not in OS cache, I think the much longer
time, the first Boolean queries took, will get more important.

In my opinion, we should keep constant score enabled. My main problem with
Boolean rewrite is the completely useless scoring. A range query should
always have constant score. We could maybe fix this some time in future,
that you can disable scorers for Boolean queries (e.g.
bq.setDoConstantScore(true)). I think this is part of this special issue in
JIRA (do not know the number yet).

A second problem with Boolean rewrite: with precStep=4, it is guaranteed,
that the query will not hit the 1024 max clause problem (see formula with
the theoretical max term number) - so no problem at all. The problem starts,
if you combine 2 or three numeric queries combined by
BooleanClaus.Occur.MUST in a top-level Boolean query (the typical example of
a geo query). In this case, the Boolean queries that only consist of MUST
may be combined into one big one (correct me if I am wrong) and then the max
clause count gets a problem.

If we change the default, keep in mind to reopen SOLR-940, as it assumes to
have constant score mode per default and solr's default precStep is 8 ->
*bang*. Maybe the solr people should fix this and still explicitely set the
mode for all range queries.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Friday, July 17, 2009 8:56 PM
> To: java-dev@lucene.apache.org
> Subject: constant-score rewrite mode for NumericRangeQuery
> 
> Should we really default to constant-score rewrite with NumericRangeQuery?
> 
> Would BooleanQuery rewrite mode give better performance on a large
> index, since the number of terms should be smallish w/ the default
> precisionStep (4), I think?
> 
> Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org