You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Paul Elschot <pa...@xs4all.nl> on 2004/07/30 23:17:50 UTC

FilteringQuery.java

Dear developers,

At the moment IndexSearcher.search(Query, Filter) computes a score
for every document matching the query before checking the filter.

With the BitSet.nextSetBit() method one might implement a
filter as a required clause in a Query. This would even allow the evt. use of 
ConjunctionScorer and skipTo() in appropriate circumstances, currently
all other clauses required.

Below is a Query that intents to do this.
It compiles against current CVS, but it has not yet been tested.
Before I start writing test code I'd like to have some comments.

For very large indexes, and relatively small nrs of filtered docs,
a similar filter could be used with something sparser than a full BitSet,
eg. a byte array of VInts with the differences between the document numbers.

Regards,
Paul.

Here it is, FilteringQuery.java, under Apache 2.0 licence:

package org.apache.lucene.search;

import java.util.BitSet;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;

public abstract class FilteringQuery extends Query {
  Filter filter;
  String filterName;
  
  public FilteringQuery(Filter filter, String filterName) {
    this.filter = filter; /* should be non null */
    this.filterName = filterName; /* for explanations */
  }
  
  protected String getFilterExplanation() {
    return (filterName != null) ? filterName : filter.toString();
  }
  
  /** Prints this <code>FilteringQuery</code> to a <code>String</code>.
   * @param field Should be null because a FilteringQuery depends on a filter.
   */
  public String toString(String field) {
    String res = "FilteringQuery( " + getFilterExplanation() + ")";
    if (field == null)
      return res;
    else
      return res + "(" + field + " ?)";
  }
  
  /** Prints this query to a string. */
  public String toString() { return toString(null); }
  
  /** Expert:
   * @return <code>null</code>. No similarity is used for scoring a </code>FilteringQuery</code>.
   */
  public Similarity getSimilarity(Searcher searcher) {return null;}

  /** Expert: Apply the Filter and use the result in another Query which
   * extends BooleanQuery to have ConjunctionScorer used when it is Query is required.
   */
  public Query rewrite(IndexReader reader) throws IOException {
    
    class SkipReaderBitsQuery extends Query {
      /** Prints this to a <code>String</code>.
       * @param  field Should be null.
       */
      public String toString(String field) {
        String res = "SkipReaderBitsQuery( " + getFilterExplanation() + ")";
        if (field == null)
          return res;
        else
          return res + "(" + field + " ?)";
      }
      
      /** Expert: Constructs a Weight implementation for this <code>SkipReaderBitsQuery</code>.
       * <p>Only implemented by primitive queries, which re-write to themselves.
       */
      protected Weight createWeight(final Searcher searcher) {
        
        class FilterWeight implements Weight {
          public float getValue() {return 0.0f;}

          public void normalize(float norm) {}

          public float sumOfSquaredWeights() {return 0.0f;}

          public Query getQuery() {return FilteringQuery.this;}

          public Explanation explain(IndexReader reader, int doc) {
            return new Explanation(getValue(), "weightless " + getFilterExplanation());
          }

          public Scorer scorer(final IndexReader reader) throws IOException {

            class SkipReaderBitsScorer extends Scorer {
              BitSet docNrs;
              int currentDoc;
              
              FilterReaderBitsScorer(Similarity similarity) throws IOException {
                super(similarity);
                /* CHECKME: ok not to compute the bits earlier? */
                docNrs = FilteringQuery.this.filter.bits(reader);
                currentDoc = -1;
              }
              
              public int doc() {return currentDoc;}

              public float score() {return 0.0f;}
              
              /* should not be called after returning false */
              public boolean next() {
                currentDoc = docNrs.nextSetBit(currentDoc + 1); /* -1 when no next bit */
                return currentDoc >= 0;
              }
              
              /* should not be called after returning false */
              public boolean skipTo(int target) {
                currentDoc = docNrs.nextSetBit((currentDoc < target) ? target : (currentDoc + 1));
                return currentDoc >= 0;
              }
              
              public Explanation explain(int doc) {
                skipTo(doc);
                return new Explanation(score() /* zero anyway */,
                                        "document " + doc + " "
                                        + ((currentDoc == doc) 
                                            ? "matches"
                                            : "does not match"
                                          )
                                        + " filter: " + getFilterExplanation());
              }
            }

            return new SkipReaderBitsScorer(getSimilarity(searcher));
          }
        }
        
        return new FilterWeight();
      }
    }
    
    return new SkipReaderBitsQuery();
  }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Re: FilteringQuery.java, Filter

Posted by Paul Elschot <pa...@xs4all.nl>.
On Friday 30 July 2004 23:29, Robert Engels wrote:
> I thought the next release we were change 'Filter' to an interface, with a
> definition
>
> interface Filter {
>    boolean accept(Document doc);
> }
>
> Is this not going to happen?

I don't know, I wasn't involved in that.

I'd rather have the BitSet in the current filter changed into a DocNrFilter
and leave the current filter as it is for backward compatibility. So how about:

interface DocNrFilter { /* new interface for BitSet, Set and other implementations */
  boolean accept(int docNr);
}

However, I'd like to have a bit some more functionality in there to support doc(), next()
and skipTo(), ie. document number iterator as needed by a Scorer.
It would a waste not to use BitSet.nextBitSet().

Thinking about it, the current FilteredQuery might be reimplemented using a 
FilteringQuery. I might give that a try one of these days.

(Rereading the posted FilteringQuery.java I see that it doesn't compile
as it is, the constructor for class SkipReaderBitsScorer is
still named FilterReaderBitsScorer,  sorry.)

Regards,
Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


RE: FilteringQuery.java

Posted by Robert Engels <re...@ix.netcom.com>.
I thought the next release we were change 'Filter' to an interface, with a
definition

interface Filter {
   boolean accept(Document doc);
}

Is this not going to happen?

Robert Engels

-----Original Message-----
From: Paul Elschot [mailto:paul.elschot@xs4all.nl]
Sent: Friday, July 30, 2004 4:18 PM
To: lucene-dev@jakarta.apache.org
Subject: FilteringQuery.java


Dear developers,

At the moment IndexSearcher.search(Query, Filter) computes a score
for every document matching the query before checking the filter.

With the BitSet.nextSetBit() method one might implement a
filter as a required clause in a Query. This would even allow the evt. use
of
ConjunctionScorer and skipTo() in appropriate circumstances, currently
all other clauses required.

Below is a Query that intents to do this.
It compiles against current CVS, but it has not yet been tested.
Before I start writing test code I'd like to have some comments.

For very large indexes, and relatively small nrs of filtered docs,
a similar filter could be used with something sparser than a full BitSet,
eg. a byte array of VInts with the differences between the document numbers.

Regards,
Paul.

Here it is, FilteringQuery.java, under Apache 2.0 licence:

package org.apache.lucene.search;

import java.util.BitSet;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;

public abstract class FilteringQuery extends Query {
  Filter filter;
  String filterName;

  public FilteringQuery(Filter filter, String filterName) {
    this.filter = filter; /* should be non null */
    this.filterName = filterName; /* for explanations */
  }

  protected String getFilterExplanation() {
    return (filterName != null) ? filterName : filter.toString();
  }

  /** Prints this <code>FilteringQuery</code> to a <code>String</code>.
   * @param field Should be null because a FilteringQuery depends on a
filter.
   */
  public String toString(String field) {
    String res = "FilteringQuery( " + getFilterExplanation() + ")";
    if (field == null)
      return res;
    else
      return res + "(" + field + " ?)";
  }

  /** Prints this query to a string. */
  public String toString() { return toString(null); }

  /** Expert:
   * @return <code>null</code>. No similarity is used for scoring a
</code>FilteringQuery</code>.
   */
  public Similarity getSimilarity(Searcher searcher) {return null;}

  /** Expert: Apply the Filter and use the result in another Query which
   * extends BooleanQuery to have ConjunctionScorer used when it is Query is
required.
   */
  public Query rewrite(IndexReader reader) throws IOException {

    class SkipReaderBitsQuery extends Query {
      /** Prints this to a <code>String</code>.
       * @param  field Should be null.
       */
      public String toString(String field) {
        String res = "SkipReaderBitsQuery( " + getFilterExplanation() + ")";
        if (field == null)
          return res;
        else
          return res + "(" + field + " ?)";
      }

      /** Expert: Constructs a Weight implementation for this
<code>SkipReaderBitsQuery</code>.
       * <p>Only implemented by primitive queries, which re-write to
themselves.
       */
      protected Weight createWeight(final Searcher searcher) {

        class FilterWeight implements Weight {
          public float getValue() {return 0.0f;}

          public void normalize(float norm) {}

          public float sumOfSquaredWeights() {return 0.0f;}

          public Query getQuery() {return FilteringQuery.this;}

          public Explanation explain(IndexReader reader, int doc) {
            return new Explanation(getValue(), "weightless " +
getFilterExplanation());
          }

          public Scorer scorer(final IndexReader reader) throws IOException
{

            class SkipReaderBitsScorer extends Scorer {
              BitSet docNrs;
              int currentDoc;

              FilterReaderBitsScorer(Similarity similarity) throws
IOException {
                super(similarity);
                /* CHECKME: ok not to compute the bits earlier? */
                docNrs = FilteringQuery.this.filter.bits(reader);
                currentDoc = -1;
              }

              public int doc() {return currentDoc;}

              public float score() {return 0.0f;}

              /* should not be called after returning false */
              public boolean next() {
                currentDoc = docNrs.nextSetBit(currentDoc + 1); /* -1 when
no next bit */
                return currentDoc >= 0;
              }

              /* should not be called after returning false */
              public boolean skipTo(int target) {
                currentDoc = docNrs.nextSetBit((currentDoc < target) ?
target : (currentDoc + 1));
                return currentDoc >= 0;
              }

              public Explanation explain(int doc) {
                skipTo(doc);
                return new Explanation(score() /* zero anyway */,
                                        "document " + doc + " "
                                        + ((currentDoc == doc)
                                            ? "matches"
                                            : "does not match"
                                          )
                                        + " filter: " +
getFilterExplanation());
              }
            }

            return new SkipReaderBitsScorer(getSimilarity(searcher));
          }
        }

        return new FilterWeight();
      }
    }

    return new SkipReaderBitsQuery();
  }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org