You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ignacio Vera (Jira)" <ji...@apache.org> on 2021/11/29 09:02:00 UTC

[jira] [Commented] (LUCENE-9619) Move Points from a visitor API to a cursor-style API?

    [ https://issues.apache.org/jira/browse/LUCENE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450277#comment-17450277 ] 

Ignacio Vera commented on LUCENE-9619:
--------------------------------------

In LUCENE-9820 we have done the first step to move the API but still the methods #visitDocsIds and #visitDocValues are using the IntersectVisitor as an input.  Here I am proposing to introduce two functional interfaces {{DocIdsVisitor}} and {{DocValuesVisitor}} to use them as the input for those methods so the API would look like:

 
{code:java}
/**
 * Basic operations to read the KD-tree.
 *
 * @lucene.experimental
 */
public interface PointTree extends Cloneable {

  /** Clone, the current node becomes the root of the new tree. */
  PointTree clone();

  /**
   * Move to the first child node and return {@code true} upon success. Returns {@code false} for
   * leaf nodes and {@code true} otherwise.
   */
  boolean moveToChild() throws IOException;

  /**
   * Move to the next sibling node and return {@code true} upon success. Returns {@code false} if
   * the current node has no more siblings.
   */
  boolean moveToSibling() throws IOException;

  /**
   * Move to the parent node and return {@code true} upon success. Returns {@code false} for the
   * root node and {@code true} otherwise.
   */
  boolean moveToParent() throws IOException;

  /** Return the minimum packed value of the current node. */
  byte[] getMinPackedValue();

  /** Return the maximum packed value of the current node. */
  byte[] getMaxPackedValue();

  /** Return the number of points below the current node. */
  long size();

  /** Visit all the docs below the current node. */
  void visitDocIDs(DocIdsVisitor docIdsVisitor) throws IOException;

  /** Visit all the docs and values below the current node. */
  default void visitDocValues(DocValuesVisitor docValuesVisitor) throws IOException {
    visitDocValues((min, max) -> Relation.CELL_CROSSES_QUERY, docID -> {}, docValuesVisitor);
  }

  /**
   * Similar to {@link #visitDocValues(DocValuesVisitor)} but in this case it allows adding a
   * filter that works like {@link IntersectVisitor#compare(byte[], byte[])}.
   */
  void visitDocValues(
      BiFunction<byte[], byte[], Relation> compare,
      DocIdsVisitor docIdsVisitor,
      DocValuesVisitor docValuesVisitor)
      throws IOException;
}

/**
 * Collects all documents below a tree node by calling {@link
 * PointTree#visitDocIDs(DocIdsVisitor)}
 */
@FunctionalInterface
public interface DocIdsVisitor {
  /** Called for all documents below a tree node. */
  void visit(int docID) throws IOException;
}

/**
 * Collects all documents and values below a tree node by calling {@link
 * PointTree#visitDocValues(DocValuesVisitor)} (DocIdsVisitor)}
 */
@FunctionalInterface
public interface DocValuesVisitor {
  /** Called for all documents and values below a tree node. */
  void visit(int docID, byte[] packedValue) throws IOException;

  /**
   * Similar to {@link DocValuesVisitor#visit(int, byte[])} but in this case the packedValue can
   * have more than one docID associated to it. The provided iterator should not escape the scope
   * of this method so that implementations of PointValues are free to reuse it.
   */
  default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException {
    int docID;
    while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
      visit(docID, packedValue);
    }
  }
}

/**
 * We recurse the {@link PointTree}, using a provided instance of this to guide the recursion.
 *
 * @lucene.experimental
 */
public interface IntersectVisitor extends DocValuesVisitor, DocIdsVisitor {

  /**
   * Called for non-leaf cells to test how the cell relates to the query, to determine how to
   * further recurse down the tree.
   *
   * <ul>
   *   <li>{@link Relation#CELL_OUTSIDE_QUERY}: Stop recursing down the current branch of the
   *       tree.
   *   <li>{@link Relation#CELL_INSIDE_QUERY}: All nodes below the current node are visited using
   *       the underlying {@link DocIdsVisitor}. he consumer should generally blindly accept the
   *       docID.
   *   <li>{@link Relation#CELL_CROSSES_QUERY}: Keep recursing down the current branch of the
   *       tree. If the current node is a leaf, visit all docs and values usinng the underlying
   *       {@link DocValuesVisitor}. The consumer should scrutinize the packedValue to decide
   *       whether to accept it.
   * </ul>
   */
  Relation compare(byte[] minPackedValue, byte[] maxPackedValue);

  /** Notifies the caller that this many documents are about to be visited */
  default void grow(int count) {}
} {code}
 

Any thoughts?

> Move Points from a visitor API to a cursor-style API?
> -----------------------------------------------------
>
>                 Key: LUCENE-9619
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9619
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Points' visitor API work well but there are a couple things we could make better if we moved to a cursor API, e.g.
>  - Term queries could return a DocIdSetIterator without having to materialize a BitSet.
>  - Nearest-neighbor search could work on top of the regular API instead of casting to BKDReader https://github.com/apache/lucene-solr/blob/6a7131ee246d700c2436a85ddc537575de2aeacf/lucene/sandbox/src/java/org/apache/lucene/sandbox/document/FloatPointNearestNeighbor.java#L296
>  - We could optimize counting the number of matches of a query by adding the number of points in a leaf without visiting documents where there are no deleted documents and a leaf fully matches the query.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org