You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Paul Elschot <pa...@xs4all.nl> on 2007/04/13 23:05:10 UTC
Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730
On Friday 13 April 2007 22:10, eks dev wrote:
> Hoss, would this work (is this what you said)?
>
> public BitSet bits(IndexReader reader) throws IOException{
> return null;
> }
>
> public Matcher getMatcher(IndexReader reader) throws IOException {
> if(bits() == null) throw new SomeException("Filter must implement at least
one of...");
> return new BitsMatcher(bits());
> }
This will not work correctly when the Scorer for the query that is searched
with a filter does not implement skipTo(), for example BooleanScorer.
See also the javadoc of class IndexSearcher in the patch.
LUCENE-730 explicitly uses BooleanScorer, but only for the non filtered case
with a top level disjunction.
I think that with LUCENE-730 also added, the filtered case with BooleanScorer
would go away, allowing to simplify this logic in IndexSearcher.
This simplification of IndexSearcher is not in the LUCENE-730 patch, because
LUCENE-584 is not committed. At the moment I don't know precisely what
IndexSearcher would look like after LUCENE-730.
With LUCENE-730 BooleanScorer.setUseScorer14() could also be
removed/deprecated, but that is also not yet in the LUCENE-730 patch.
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet; relation with LUCENE-730
Posted by Paul Elschot <pa...@xs4all.nl>.
Hoss,
A bit long, sorry for that, sometimes things are just as complex as they are.
On Saturday 14 April 2007 01:13, Chris Hostetter wrote:
>
...
>
> I don'tget it, how would a Scorer not implement skipTo? ...oh...
>
> final class BooleanScorer extends Scorer {
> ...
> public boolean skipTo(int target) {
> throw new UnsupportedOperationException();
> }
Some history for the underlying reason for this:
Once upon a time no Scorer would implement skipTo().
Most people would use BooleanScorer for queries with multiple terms, and
things worked well with the Scorer.next() method, especially for
disjunctions. Occasionally documents would be scored out of document order,
but that did not lead to problems because Hits would reorder the documents by
score value anyway.
Then skipTo() was added to improve the speed of conjunctions. To do this each
Scorer needs to score all documents in document number order and implement
skipTo() because it skipTo() used by ConjunctionScorer. BooleanScorer will
only use ConjunctionScorer in very specific (but also frequently occurring)
circumstances. At this point the index format was also changed to include the
skip forward information.
As I said, the implementation of disjunctions in BooleanScorer does not score
documents strictly in document order. It can be made to do that, but that
would lead to some loss of performance. BooleanScorer uses a kind of
distributive sort that is faster than the priority queue used by
DisjunctionSumScorer.
Then BooleanScorer2 came along. BooleanScorer2 uses ConjunctionScorer in more
circumstances than BooleanScorer., and it usesuses DisjunctionSumScorer for
disjunctions. LUCENCE-730 is an attempt to get the top level disjunction
performance of BooleanScorer back.
Disjunctions below top level, for example in a query like this:
+(a1 a2) +(b1 b2)
need skipTo() (called from ConjunctionScorer) on the two nested disjunctions,
and for that DisjunctionSumScorer is used. Currently for the top level
disjunction case:
a1 a2 b1 b2
DisjunctionSumScorer is normally used. But when the setUseScorer14() method is
used, BooleanScorer will (always?) be used. The patch at LUCENE-584 tries to
handle this setUseScorer14() case by keeping also the old filtering method
that checks the Bits individually in IndexSearcher.
LUCENE-730 will always use BooleanScorer for the top level disjunctions, so
with a bit of luck the setUseScorer14 method can also be deprecated/removed.
LUCENE-584 has another possible performance advantage in that it allows an
implementation of filtering by using a ConjunctionScorer directly instead of
doing the filtering in IndexSearcher, but that still needs to be added.
Regards,
Paul Elschot
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet;
relation with LUCENE-730
Posted by Chris Hostetter <ho...@fucit.org>.
: > Hoss, would this work (is this what you said)?
: > public Matcher getMatcher(IndexReader reader) throws IOException {
: > if(bits() == null) throw new SomeException("Filter must implement at least
: one of...");
: > return new BitsMatcher(bits());
: > }
Assuming BitsMatcher does what i think it does then yes, that's what i had
in mind ... i was specificly saying to make a default Matcher
implementation out of the code in the patched version of IndexSearcher
that has the comment...
+ } else { // bits for filtering, skipTo() not used on scorer:
: This will not work correctly when the Scorer for the query that is searched
: with a filter does not implement skipTo(), for example BooleanScorer.
: See also the javadoc of class IndexSearcher in the patch.
I don'tget it, how would a Scorer not implement skipTo? ...oh...
final class BooleanScorer extends Scorer {
...
public boolean skipTo(int target) {
throw new UnsupportedOperationException();
}
...so lemme see if i understand this:
What's happening in the current trunk is that the only situations
in which code will attempt to call skipTo on a Scorer are:
a) From the score(HitCollector hc) method of the same Scorer class
(you should know if you suport it, you're in the class)
b) From the skipTo method of an enclosing Scorer
(If you "add" Scorer X to a a wrapper Scorer Y, and Y implements
skipTo, it can assume that X implements skipTo).
Am I correct so far?
In the latest version of the Matcher patch...
https://issues.apache.org/jira/secure/attachment/12352057/Matcher20070226.patch
...this changes, such that IndexSearcher will assume a Scorer supports
skipTo iff a Filter is used which implements getMatcher (I guess the
assumption being that if the code being used is new enough to support Matchers, it's
new enough to support Scorer.skipTo). *BUT* if it's an "old" Filter using
a BitSet the code in IndexSearcher will continue with the same old
assumptions about the Scorer.
And the change eks describes (which is a much better way to describe what
i was suggesting) would break this safety net by always assuming skipTo
was safe to call.
So really the issue is that the patch assumpes one thing (Scorer supports
skipTo) based on the presence of something that should be thought of as
"newer" (Filter supports getMatcher) and relying on documentation to
enforce this.
Am I caught up now?
Off the top of my head, the best solution i can think of to this issue
would be to add the naive implementation of skipTo to Scorer, remove
the UnsupportedOperationException of skipTo from all Scorers in the core,
and rev Lucene to version 3.0 since this would probably be considered a
serious API change (method sigs don't change, but now we're requiring
people to implement a method that we have said in the past (by example)
can be Unsupported.
In general i'm not fond of assuming Scorer.skipTo when Filter.getMatcher
... the concepts are really orthoginal and even if it's a decent
assumption to make today, it doens't help us tomorow when we want to add a
getMatcher method to all of the core Filter classes to improve
performance.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org