You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "kkewwei (Jira)" <ji...@apache.org> on 2022/05/20 03:31:00 UTC
[jira] [Commented] (LUCENE-10516) reduce unnecessary loop matches in BKDReader

    [ https://issues.apache.org/jira/browse/LUCENE-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17539903#comment-17539903 ] 

kkewwei commented on LUCENE-10516:
----------------------------------

For the spareDocValues, we use compression to store data: sameCount, detailValue,  In the BKDReader, we compare the same batch docIds in the loop, the iterator seems useless.

{code:java}
// read cardinality and point
  private void visitSparseRawDocValues(int[] commonPrefixLengths, byte[] scratchPackedValue, IndexInput in, BKDReaderDocIDSetIterator scratchIterator, int count, IntersectVisitor visitor) throws IOException {
    int i;
    for (i = 0; i < count;) {
      // read the same values count
      int length = in.readVInt();
     // read the detail values
      for(int dim = 0; dim < numDataDims; dim++) {
        int prefix = commonPrefixLengths[dim];
        in.readBytes(scratchPackedValue, dim*bytesPerDim + prefix, bytesPerDim - prefix);
      }
      scratchIterator.reset(i, length); 
     // iterate compare every same values.
      visitor.visit(scratchIterator, scratchPackedValue); 
      i += length;
    }
    if (i != count) {
      throw new CorruptIndexException("Sub blocks do not add up to the expected count: " + count + " != " + i, in);
    }
  }
{code}


> reduce unnecessary loop matches in BKDReader
> --------------------------------------------
>
>                 Key: LUCENE-10516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10516
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 8.6.2
>            Reporter: kkewwei
>            Priority: Major
>
> In *BKDReader.visitSparseRawDocValues()*, we will read a batch of docIds which have the same point value:*scratchPackedValue*, then call *visitor.visit(scratchIterator, scratchPackedValue)* to find which docIDs match the range.
> {code:java}
> default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException {
>       int docID;
>       while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { 
>         visit(docID, packedValue); 
>       }
>     }
> {code}
> We know that the packedValue are same for the batch of docIds, if the first doc match the range, the batch of other docIds will also match the range, so the loop seems useless.
> We should call the method as follow:
> {code:java}
>           public void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException {
>             if (matches(packedValue)) {
>               int docID;
>               while ((docID = iterator.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) {
>                 visit(docID);
>               }
>             }
>           }
> {code}
> https://github.com/apache/lucene/blob/2e941fcfed6cad3d9c8667ff5324cd04858ba547/lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java#L196
> If we should override the *visit(DocIdSetIterator iterator, byte[] packedValue)* in *ExitableDirectoryReader$ExitableIntersectVisitor* to avoid calling the default implement:
> {code:java}
>         @Override
>         public void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOException {
>             queryCancellation.checkCancelled();
>             in.visit(iterator, packedValue);
>         }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org