You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Ignacio Vera (Jira)" <ji...@apache.org> on 2021/11/29 09:02:00 UTC
[jira] [Commented] (LUCENE-9619) Move Points from a visitor API to a cursor-style API?
[ https://issues.apache.org/jira/browse/LUCENE-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450277#comment-17450277 ]
Ignacio Vera commented on LUCENE-9619:
--------------------------------------
In LUCENE-9820 we have done the first step to move the API but still the methods #visitDocsIds and #visitDocValues are using the IntersectVisitor as an input. Here I am proposing to introduce two functional interfaces {{DocIdsVisitor}} and {{DocValuesVisitor}} to use them as the input for those methods so the API would look like:
{code:java}
/**
* Basic operations to read the KD-tree.
*
* @lucene.experimental
*/
public interface PointTree extends Cloneable {
/** Clone, the current node becomes the root of the new tree. */
PointTree clone();
/**
* Move to the first child node and return {@code true} upon success. Returns {@code false} for
* leaf nodes and {@code true} otherwise.
*/
boolean moveToChild() throws IOException;
/**
* Move to the next sibling node and return {@code true} upon success. Returns {@code false} if
* the current node has no more siblings.
*/
boolean moveToSibling() throws IOException;
/**
* Move to the parent node and return {@code true} upon success. Returns {@code false} for the
* root node and {@code true} otherwise.
*/
boolean moveToParent() throws IOException;
/** Return the minimum packed value of the current node. */
byte[] getMinPackedValue();
/** Return the maximum packed value of the current node. */
byte[] getMaxPackedValue();
/** Return the number of points below the current node. */
long size();
/** Visit all the docs below the current node. */
void visitDocIDs(DocIdsVisitor docIdsVisitor) throws IOException;
/** Visit all the docs and values below the current node. */
default void visitDocValues(DocValuesVisitor docValuesVisitor) throws IOException {
visitDocValues((min, max) -> Relation.CELL_CROSSES_QUERY, docID -> {}, docValuesVisitor);
}
/**
* Similar to {@link #visitDocValues(DocValuesVisitor)} but in this case it allows adding a
* filter that works like {@link IntersectVisitor#compare(byte[], byte[])}.
*/
void visitDocValues(
BiFunction<byte[], byte[], Relation> compare,
DocIdsVisitor docIdsVisitor,
DocValuesVisitor docValuesVisitor)
throws IOException;
}
/**
* Collects all documents below a tree node by calling {@link
* PointTree#visitDocIDs(DocIdsVisitor)}
*/
@FunctionalInterface
public interface DocIdsVisitor {
/** Called for all documents below a tree node. */
void visit(int docID) throws IOException;
}
/**
* Collects all documents and values below a tree node by calling {@link
* PointTree#visitDocValues(DocValuesVisitor)} (DocIdsVisitor)}
*/
@FunctionalInterface
public interface DocValuesVisitor {
/** Called for all documents and values below a tree node. */
void visit(int docID, byte[] packedValue) throws IOException;
/**
* Similar to {@link DocValuesVisitor#visit(int, byte[])} but in this case the packedValue can
* have more than one docID associated to it. The provided iterator should not escape the scope
* of this method so that implementations of PointValues are free to reuse it.
*/
default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException {
int docID;
while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
visit(docID, packedValue);
}
}
}
/**
* We recurse the {@link PointTree}, using a provided instance of this to guide the recursion.
*
* @lucene.experimental
*/
public interface IntersectVisitor extends DocValuesVisitor, DocIdsVisitor {
/**
* Called for non-leaf cells to test how the cell relates to the query, to determine how to
* further recurse down the tree.
*
* <ul>
* <li>{@link Relation#CELL_OUTSIDE_QUERY}: Stop recursing down the current branch of the
* tree.
* <li>{@link Relation#CELL_INSIDE_QUERY}: All nodes below the current node are visited using
* the underlying {@link DocIdsVisitor}. he consumer should generally blindly accept the
* docID.
* <li>{@link Relation#CELL_CROSSES_QUERY}: Keep recursing down the current branch of the
* tree. If the current node is a leaf, visit all docs and values usinng the underlying
* {@link DocValuesVisitor}. The consumer should scrutinize the packedValue to decide
* whether to accept it.
* </ul>
*/
Relation compare(byte[] minPackedValue, byte[] maxPackedValue);
/** Notifies the caller that this many documents are about to be visited */
default void grow(int count) {}
} {code}
Any thoughts?
> Move Points from a visitor API to a cursor-style API?
> -----------------------------------------------------
>
> Key: LUCENE-9619
> URL: https://issues.apache.org/jira/browse/LUCENE-9619
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
>
> Points' visitor API work well but there are a couple things we could make better if we moved to a cursor API, e.g.
> - Term queries could return a DocIdSetIterator without having to materialize a BitSet.
> - Nearest-neighbor search could work on top of the regular API instead of casting to BKDReader https://github.com/apache/lucene-solr/blob/6a7131ee246d700c2436a85ddc537575de2aeacf/lucene/sandbox/src/java/org/apache/lucene/sandbox/document/FloatPointNearestNeighbor.java#L296
> - We could optimize counting the number of matches of a query by adding the number of points in a leaf without visiting documents where there are no deleted documents and a leaf fully matches the query.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org