You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Paul Elschot <pa...@xs4all.nl> on 2005/05/22 10:25:59 UTC

Re: Submission, btree BooleanScorer

On Sunday 22 May 2005 03:09, Karl Wright wrote:
> I've been looking at the BooleanScorer code in 1.4.3 and realized that it 
has several problems.  These are:
>  
> 1) It does things in chunks of 1024 document ids.  This means it executes in 
a time that depends on the number of indexed documents.
> 2) Finding the subscorer with the lowest document id scales linearly with 
the number of scorers (corresponding to clauses in the Boolean query)
> 3) It does not implement the skipTo() method, because its technique of 
doiing 1024 document id's at a time interferes with this.  This makes it 
impossible to use a BooleanScorer within a Conjunction Scorer.
>  
> I've attached a rewritten BooleanScorer which solves these problems.  It 
basically uses a btree to keep the individual subscorers, and it removes 
subscorers that have reached the end of their documents.  It thus removes the 
dependency on the number of documents indexed, and it performs in 
O(log(number of clauses)) instead of O(number of clauses).

Great. For disjunctions the BooleanScorer in the svn trunk implements
skipTo and it uses a priority queue (a heap) so the base of the log is 2.
Is the btree faster than that?
I had a short look at the code of addToTree, and the btree seems to be
unbalanced, so it could theoretically degrade to linear performance,
but in practice it could be good.
Did you try it with a PrefixQuery or a RangeQuery that expands to a lot
of terms?

One way to find out about performance of various boolean scorers
is to start from the TestDisjunctionPerf1.java here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=34193
This might involve some class renaming in order to be able to
run under the same class loader.

Working in chunks as the 1.4.3 scorer does still has the best
performance afaik.

The svn trunk also has a test case TestBoolean2 in the test
package org.apache.lucene.search.
I hope the btree version passes this, it caught a few initial mistakes
in the current BooleanScorer2.
Still, I think there is a shortage of test cases for the boolean query and
scorers.

The version posted here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=34154
splits off the coordination computations in case they are not
needed, which might affect performance also.

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org