You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Paul Elschot <pa...@xs4all.nl> on 2007/04/13 23:05:10 UTC

Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

On Friday 13 April 2007 22:10, eks dev wrote:
> Hoss, would this work (is this what you said)? 
>  
> public BitSet bits(IndexReader reader) throws IOException{
>  return null;
> }
> 
> public Matcher getMatcher(IndexReader reader) throws IOException {
>   if(bits() == null) throw new SomeException("Filter must implement at least 
one of..."); 
>   return new BitsMatcher(bits());
> }

This will not work correctly when the Scorer for the query that is searched
with a filter does not implement skipTo(), for example BooleanScorer.
See also the javadoc of class IndexSearcher in the patch.

LUCENE-730 explicitly uses BooleanScorer, but only for the non filtered case
with a top level disjunction.

I think that with LUCENE-730 also added, the filtered case with BooleanScorer 
would go away, allowing to simplify this logic in IndexSearcher.
This simplification of IndexSearcher is not in the LUCENE-730 patch, because 
LUCENE-584 is not committed. At the moment I don't know precisely what
IndexSearcher would look like after LUCENE-730.

With LUCENE-730 BooleanScorer.setUseScorer14() could also be 
removed/deprecated, but that is also not yet in the LUCENE-730 patch.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

Posted by Paul Elschot <pa...@xs4all.nl>.
Hoss,

A bit long, sorry for that, sometimes things are just as complex as they are.

On Saturday 14 April 2007 01:13, Chris Hostetter wrote:
> 
...
> 
> I don'tget it, how would a Scorer not implement skipTo? ...oh...
> 
> 	final class BooleanScorer extends Scorer {
> 	  ...
> 	  public boolean skipTo(int target) {
> 	    throw new UnsupportedOperationException();
> 	  }

Some history for the underlying reason for this:

Once upon a time no Scorer would implement skipTo().
Most people would use BooleanScorer for queries with multiple terms, and 
things worked well with the Scorer.next() method, especially for 
disjunctions. Occasionally documents would be scored out of document order, 
but that did not lead to problems because Hits would reorder the documents by 
score value anyway.

Then skipTo() was added to improve the speed of conjunctions. To do this each 
Scorer needs to score all documents in document number order and implement 
skipTo() because it skipTo() used by ConjunctionScorer. BooleanScorer will 
only use ConjunctionScorer in very specific (but also frequently occurring) 
circumstances. At this point the index format was also changed to include the 
skip forward information.

As I said, the implementation of disjunctions in BooleanScorer does not score 
documents strictly in document order. It can be made to do that, but that 
would lead to some loss of performance. BooleanScorer uses a kind of 
distributive sort that is faster than the priority queue used by 
DisjunctionSumScorer.

Then BooleanScorer2 came along. BooleanScorer2 uses ConjunctionScorer in more 
circumstances than BooleanScorer., and it usesuses DisjunctionSumScorer for 
disjunctions. LUCENCE-730 is an attempt to get the top level disjunction 
performance of BooleanScorer back.

Disjunctions below top level, for example in a query like this:
+(a1 a2) +(b1 b2)
need skipTo() (called from ConjunctionScorer) on the two nested disjunctions, 
and for that DisjunctionSumScorer is used. Currently for the top level 
disjunction case:
a1 a2 b1 b2
DisjunctionSumScorer is normally used. But when the setUseScorer14() method is 
used, BooleanScorer will (always?) be used. The patch at LUCENE-584 tries to 
handle this setUseScorer14() case by keeping also the old filtering method 
that checks the Bits individually in IndexSearcher.
LUCENE-730 will always use BooleanScorer for the top level disjunctions, so 
with a bit of luck the setUseScorer14 method can also be deprecated/removed.

LUCENE-584 has another possible performance advantage in that it allows an 
implementation of filtering by using a ConjunctionScorer directly instead of 
doing the filtering in IndexSearcher, but that still needs to be added.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730

Posted by Chris Hostetter <ho...@fucit.org>.
: > Hoss, would this work (is this what you said)?

: > public Matcher getMatcher(IndexReader reader) throws IOException {
: >   if(bits() == null) throw new SomeException("Filter must implement at least
: one of...");
: >   return new BitsMatcher(bits());
: > }

Assuming BitsMatcher does what i think it does then yes, that's what i had
in mind ... i was specificly saying to make a default Matcher
implementation out of the code in the patched version of IndexSearcher
that has the comment...

+    } else { // bits for filtering, skipTo() not used on scorer:

: This will not work correctly when the Scorer for the query that is searched
: with a filter does not implement skipTo(), for example BooleanScorer.
: See also the javadoc of class IndexSearcher in the patch.

I don'tget it, how would a Scorer not implement skipTo? ...oh...

	final class BooleanScorer extends Scorer {
	  ...
	  public boolean skipTo(int target) {
	    throw new UnsupportedOperationException();
	  }

...so lemme see if i understand this:

What's happening in the current trunk is that the only situations
in which code will attempt to call skipTo on a Scorer are:
 a) From the score(HitCollector hc) method of the same Scorer class
    (you should know if you suport it, you're in the class)
 b) From the skipTo method of an enclosing Scorer
    (If you "add" Scorer X to a a wrapper Scorer Y, and Y implements
    skipTo, it can assume that X implements skipTo).

Am I correct so far?

In the latest version of the Matcher patch...
https://issues.apache.org/jira/secure/attachment/12352057/Matcher20070226.patch
...this changes, such that IndexSearcher will assume a Scorer supports
skipTo iff a Filter is used which implements getMatcher (I guess the
assumption being that if the code being used is new enough to support Matchers, it's
new enough to support Scorer.skipTo).  *BUT* if it's an "old" Filter using
a BitSet the code in IndexSearcher will continue with the same old
assumptions about the Scorer.

And the change eks describes (which is a much better way to describe what
i was suggesting) would break this safety net by always assuming skipTo
was safe to call.

So really the issue is that the patch assumpes one thing (Scorer supports
skipTo) based on the presence of something that should be thought of as
"newer" (Filter supports getMatcher) and relying on documentation to
enforce this.

Am I caught up now?

Off the top of my head, the best solution i can think of to this issue
would be to add the naive implementation of skipTo to Scorer, remove
the UnsupportedOperationException of skipTo from all Scorers in the core,
and rev Lucene to version 3.0 since this would probably be considered a
serious API change (method sigs don't change, but now we're requiring
people to implement a method that we have said in the past (by example)
can be Unsupported.

In general i'm not fond of assuming Scorer.skipTo when Filter.getMatcher
... the concepts are really orthoginal and even if it's a decent
assumption to make today, it doens't help us tomorow when we want to add a
getMatcher method to all of the core Filter classes to improve
performance.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org