You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/05/27 03:42:59 UTC

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #5444: Enhance and simplify the filtering

Jackie-Jiang opened a new pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444


   Removed the methods that complicate the code but provide no value:
   - BlockDocIdSet
     - getRaw
   - FilterBlockDocIdSet
     - getMinDocId
     - getMaxDocId
     - setStartDocId
     - setEndDocId
   - BlockDocIdIterator
     - currentDocId
   - ScanBasedDocIdIterator
     - isMatch
   
   Uniformed the behavior of all filter-related classes to bound the return docIds with numDocs
   Simplified the logic of AND/OR handling
   Pushed down the AND smart handling logic to BlockDocIdIterator to ensure the best performance:
   - AndDocIdSet might return RangelessBitmapDocIdIterator which can be merged with other iterators
   - OrDocIdSet might return BitmapDocIdIterator which can be merged with other iterators
   
   Tested all the queries (10K PQL, 10K SQL) within the query file using HybridClusterIntegrationTest


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-638356820


   > Changes to query execution engine should go through a performance benchmark.
   
   @mayankshriv Totally agree. But before the benchmark framework is ready, there is no way to test the performance thoroughly.
   I already tested the functionality with all the queries in the integration test, and conceptually this change will improve the performance because the it removes some redundant if checks and optimizes certain code paths. I also noticed that the tests are finished a little bit faster, but the difference is within error range.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r433642778



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/ArrayBasedDocIdIterator.java
##########
@@ -36,37 +35,15 @@ public ArrayBasedDocIdIterator(int[] docIds, int searchableLength) {
 
   @Override
   public int next() {
-    if (_currentDocId == Constants.EOF) {
-      return Constants.EOF;
-    }
-    if (++_currentIndex == _searchableLength) {
-      _currentDocId = Constants.EOF;
+    if (_nextIndex < _searchableLength) {
+      return _docIds[_nextIndex++];

Review comment:
       what happens if _nextIndex >= docIds.length? should we have a array length check?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432692609



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/ScanBasedDocIdIterator.java
##########
@@ -24,22 +24,18 @@
 
 
 /**
- * All scan based filter iterators must implement this interface. This allows intersection to be
- * optimized.
- * For example, if the we have two iterators one index based and another scan based, instead of

Review comment:
       This part is explained in the AndDocIdSet. Updated the javadoc so that it is more clear here as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432323606



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdSet.java
##########
@@ -18,9 +18,15 @@
  */
 package org.apache.pinot.core.common;
 
+/**
+ * The interface <code>BlockDocIdSet</code> represents all the matching document ids for a predicate.

Review comment:
       +1000 on removing this.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdIterator.java
##########
@@ -25,25 +25,16 @@
 public interface BlockDocIdIterator {
 
   /**
-   * Get the next document id.
-   *
-   * @return Next document id or EOF if there is no more documents
+   * Returns the next matched document id, or {@link Constants#EOF} if there is no more matched document.
+   * <p>NOTE: There should be no more call to this method after it returns {@link Constants#EOF}.
    */
   int next();
 
   /**
-   * Advance to the first document whose id is equal or greater than the given target document id.
-   * <p>If the given target document id is smaller or equal to the current document id, then return the current one.
-   *
-   * @param targetDocId The target document id
-   * @return First document id that is equal or greater than target or EOF if no document matches
+   * Returns the first matched document whose id is equal to or greater than the given target document id, or
+   * {@link Constants#EOF} if there is no such document.
+   * <p>NOTE: The target document id should be greater than the document id previous returned.

Review comment:
       What happens from an API point of view if the target is same as last returned matching docId?
   We won't throw error and return the target as is. So, the comment should state greater than or equal to.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat edited a comment on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat edited a comment on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-635802473


   > @chenboat There is no new added major classes. Most of the changes are making the classes compatible with the filter interface change, so it is very hard to break them into multiple PRs. I can make the removed interface in BlockValIterator a separate PR as they are independent of the filtering.
   > This PR is the first step of re-structuring the filtering in Pinot, so for historical reason the name for some interfaces (e.g. BlockDocIdSet) might be confusing. I wouldn't spend too much time renaming and documenting them because they are going to be re-structured in the following steps.
   
   Thanks for reducing the number of changed files. Can you update the summary as well? I also realized some new classes are just renaming of previously existing classes -- my bad.
    
   I still think there is room for breaking up this PR. Based on your summary there are multiple things going on in this PR:
    > 1. Uniformed the behavior of all filter-related classes to bound the return docIds with numDocs
    > 2. Simplified the logic of AND/OR handling
    > 3. Pushed down ... 
   
   Can we put the 3 items in 3 PR? I found there are non-trivial coding refactoring  (e.g., pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java) mixed with interface changes in this PR. Can we separate these into smaller PRs? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432045689



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.docidsets;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.pinot.common.utils.Pairs.IntPair;
+import org.apache.pinot.core.common.BlockDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.AndDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.BitmapBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.RangelessBitmapDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.ScanBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.SortedDocIdIterator;
+import org.apache.pinot.core.util.SortedRangeIntersection;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * The FilterBlockDocIdSet to perform AND on all child FilterBlockDocIdSets.
+ * <p>The AndBlockDocIdSet will construct the BlockDocIdIterator based on the BlockDocIdIterators from the child
+ * FilterBlockDocIdSets:
+ * <ul>
+ *   <li>
+ *     When there are at least one index-base BlockDocIdIterators (SortedDocIdIterator or BitmapBasedDocIdIterator) and

Review comment:
       It merges all index-based and scan-based BlockDocIdIterators into one bitmap and construct a RangelessBitmapDocIdIterator over it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432244794



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/AndDocIdIterator.java
##########
@@ -18,112 +18,52 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import java.util.Arrays;
 import org.apache.pinot.core.common.BlockDocIdIterator;
 import org.apache.pinot.core.common.Constants;
 
 
-// TODO: Optimize this
 public final class AndDocIdIterator implements BlockDocIdIterator {
-  public final BlockDocIdIterator[] docIdIterators;
-  public ScanBasedDocIdIterator[] scanBasedDocIdIterators;
-  public final int[] docIdPointers;
-  public boolean reachedEnd = false;
-  public int currentDocId = -1;
-  int currentMax = -1;
-  private boolean hasScanBasedIterators;
+  public final BlockDocIdIterator[] _docIdIterators;
 
-  public AndDocIdIterator(BlockDocIdIterator[] blockDocIdIterators) {
-    int numIndexBasedIterators = 0;
-    int numScanBasedIterators = 0;
-    for (int i = 0; i < blockDocIdIterators.length; i++) {
-      if (blockDocIdIterators[i] instanceof IndexBasedDocIdIterator) {
-        numIndexBasedIterators = numIndexBasedIterators + 1;
-      } else if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-        numScanBasedIterators = numScanBasedIterators + 1;
-      }
-    }
-    // if we have at least one index based then do intersection based on index based only, and then
-    // check if matching docs apply on scan based iterator
-    if (numIndexBasedIterators > 0 && numScanBasedIterators > 0) {
-      hasScanBasedIterators = true;
-      int nonScanIteratorsSize = blockDocIdIterators.length - numScanBasedIterators;
-      this.docIdIterators = new BlockDocIdIterator[nonScanIteratorsSize];
-      this.scanBasedDocIdIterators = new ScanBasedDocIdIterator[numScanBasedIterators];
-      int nonScanBasedIndex = 0;
-      int scanBasedIndex = 0;
-      for (int i = 0; i < blockDocIdIterators.length; i++) {
-        if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-          this.scanBasedDocIdIterators[scanBasedIndex++] = (ScanBasedDocIdIterator) blockDocIdIterators[i];
-        } else {
-          this.docIdIterators[nonScanBasedIndex++] = blockDocIdIterators[i];
-        }
-      }
-    } else {
-      hasScanBasedIterators = false;
-      this.docIdIterators = blockDocIdIterators;
-    }
-    this.docIdPointers = new int[docIdIterators.length];
-    Arrays.fill(docIdPointers, -1);
-  }
+  private int _nextDocId = 0;
 
-  @Override
-  public int advance(int targetDocId) {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    if (currentDocId >= targetDocId) {
-      return currentDocId;
-    }
-    // next() method will always increment currentMax by 1.
-    currentMax = targetDocId - 1;
-    return next();
+  public AndDocIdIterator(BlockDocIdIterator[] docIdIterators) {
+    _docIdIterators = docIdIterators;
   }
 
   @Override
   public int next() {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    currentMax = currentMax + 1;
-    // always increment the pointer to current max, when this is called first time, every one will
-    // be set to start of posting list.
-    for (int i = 0; i < docIdIterators.length; i++) {
-      docIdPointers[i] = docIdIterators[i].advance(currentMax);
-      if (docIdPointers[i] == Constants.EOF) {
-        reachedEnd = true;
-        currentMax = Constants.EOF;
-        break;
-      }
-      if (docIdPointers[i] > currentMax) {
-        currentMax = docIdPointers[i];
-        if (i > 0) {
-          // we need to advance all pointer since we found a new max
-          i = -1;
-        }
-      }
-      if (hasScanBasedIterators && i == docIdIterators.length - 1) {
-        // this means we found the docId common to all nonScanBased iterators, now we need to ensure
-        // that its also found in scanBasedIterator, if not matched, we restart the intersection
-        for (ScanBasedDocIdIterator iterator : scanBasedDocIdIterators) {
-          if (!iterator.isMatch(currentMax)) {
-            i = -1;
-            currentMax = currentMax + 1;
-            break;
+    int maxDocId = _nextDocId;
+    int maxDocIdIndex = -1;
+    int numDocIdIterators = _docIdIterators.length;
+    int index = 0;
+    while (index < numDocIdIterators) {
+      if (index == maxDocIdIndex) {
+        // Skip the index with the max document id
+        index++;

Review comment:
       Changed the code a little bit so that it is more clear how index is kept in bound.
   Added AndDocIdIteratorTest and OrDocIdIteratorTest.
   I have run all the queries before submitting the PR. Will run them again before merging.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-635648350


   Removed the changes not closely related to the enhancement to reduce the size of the change


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-638416573


   @mayankshriv Did a rough performance benchmark running all the queries in the query file for 10 times (100K queries in total) in Hybrid, MultiNodesOffline and Realtime cluster integration test and the result is very close before and after this PR (both within the range of 23-24 seconds).
   So the gain of this PR is mostly for simplicity and readability of the code as well as enhancement for certain code path (e.g. the problem addressed in #5328 won't happen again)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432322094



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/filter/BitmapBasedFilterOperator.java
##########
@@ -25,62 +25,73 @@
 import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
 import org.apache.pinot.core.segment.index.readers.InvertedIndexReader;
 import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
+@SuppressWarnings("rawtypes")
 public class BitmapBasedFilterOperator extends BaseFilterOperator {
   private static final String OPERATOR_NAME = "BitmapBasedFilterOperator";
 
   private final PredicateEvaluator _predicateEvaluator;
-  private final DataSource _dataSource;
-  private final ImmutableRoaringBitmap[] _bitmaps;
-  private final int _startDocId;
-  // TODO: change it to exclusive
-  // Inclusive
-  private final int _endDocId;
+  private final InvertedIndexReader _invertedIndexReader;
+  private final ImmutableRoaringBitmap _docIds;
   private final boolean _exclusive;
+  private final int _numDocs;
 
-  BitmapBasedFilterOperator(PredicateEvaluator predicateEvaluator, DataSource dataSource, int startDocId,
-      int endDocId) {
-    // NOTE:
-    // Predicate that is always evaluated as true or false should not be passed into the BitmapBasedFilterOperator for
-    // performance concern.
-    // If predicate is always evaluated as true, use MatchAllFilterOperator; if predicate is always evaluated as false,
-    // use EmptyFilterOperator.
-    Preconditions.checkArgument(!predicateEvaluator.isAlwaysTrue() && !predicateEvaluator.isAlwaysFalse());
-
+  BitmapBasedFilterOperator(PredicateEvaluator predicateEvaluator, DataSource dataSource, int numDocs) {
     _predicateEvaluator = predicateEvaluator;
-    _dataSource = dataSource;
-    _bitmaps = null;
-    _startDocId = startDocId;
-    _endDocId = endDocId;
+    _invertedIndexReader = dataSource.getInvertedIndex();
+    _docIds = null;
     _exclusive = predicateEvaluator.isExclusive();
+    _numDocs = numDocs;
   }
 
-  public BitmapBasedFilterOperator(ImmutableRoaringBitmap[] bitmaps, int startDocId, int endDocId, boolean exclusive) {
+  public BitmapBasedFilterOperator(ImmutableRoaringBitmap docIds, boolean exclusive, int numDocs) {
     _predicateEvaluator = null;
-    _dataSource = null;
-    _bitmaps = bitmaps;
-    _startDocId = startDocId;
-    _endDocId = endDocId;
+    _invertedIndexReader = null;
+    _docIds = docIds;
     _exclusive = exclusive;
+    _numDocs = numDocs;
   }
 
   @Override
   protected FilterBlock getNextBlock() {
-    if (_bitmaps != null) {
-      return new FilterBlock(new BitmapDocIdSet(_bitmaps, _startDocId, _endDocId, _exclusive));
+    if (_docIds != null) {
+      if (_exclusive) {
+        return new FilterBlock(new BitmapDocIdSet(ImmutableRoaringBitmap.flip(_docIds, 0L, _numDocs), _numDocs));
+      } else {
+        return new FilterBlock(new BitmapDocIdSet(_docIds, _numDocs));
+      }
     }
 
     int[] dictIds = _exclusive ? _predicateEvaluator.getNonMatchingDictIds() : _predicateEvaluator.getMatchingDictIds();

Review comment:
       Is is possible to handle NOT_IN, NEQ exactly once?
   We checked for _exclusive and accordingly get non matching dictIds or matching dictIds based on whether it is true or false. So the predicate is already evaluated correctly. Now why can't we can just work on the docIds for these dictIds.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r431615072



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.docidsets;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.pinot.common.utils.Pairs.IntPair;
+import org.apache.pinot.core.common.BlockDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.AndDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.BitmapBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.RangelessBitmapDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.ScanBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.SortedDocIdIterator;
+import org.apache.pinot.core.util.SortedRangeIntersection;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * The FilterBlockDocIdSet to perform AND on all child FilterBlockDocIdSets.
+ * <p>The AndBlockDocIdSet will construct the BlockDocIdIterator based on the BlockDocIdIterators from the child
+ * FilterBlockDocIdSets:
+ * <ul>
+ *   <li>
+ *     When there are at least one index-base BlockDocIdIterators (SortedDocIdIterator or BitmapBasedDocIdIterator) and

Review comment:
       does it mean the class performs binary merge of 2 child BlockDocIdSets each time?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432699285



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java
##########
@@ -18,238 +18,138 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import org.apache.pinot.core.common.BlockMetadata;
 import org.apache.pinot.core.common.BlockSingleValIterator;
-import org.apache.pinot.core.common.BlockValSet;
 import org.apache.pinot.core.common.Constants;
 import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
-import org.apache.pinot.spi.data.FieldSpec;
 import org.roaringbitmap.IntIterator;
 import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
 import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
-public class SVScanDocIdIterator implements ScanBasedDocIdIterator {
-  private int _currentDocId = -1;
+public final class SVScanDocIdIterator implements ScanBasedDocIdIterator {
+  private final PredicateEvaluator _predicateEvaluator;
   private final BlockSingleValIterator _valueIterator;
-  private int _startDocId;
-  private int _endDocId;
-  private PredicateEvaluator _evaluator;
-  private String _operatorName;
-  private int _numEntriesScanned = 0;
+  private final int _numDocs;
   private final ValueMatcher _valueMatcher;
 
-  public SVScanDocIdIterator(String operatorName, BlockValSet blockValSet, BlockMetadata blockMetadata,
-      PredicateEvaluator evaluator) {
-    _operatorName = operatorName;
-    _evaluator = evaluator;
-    _valueIterator = (BlockSingleValIterator) blockValSet.iterator();
-
-    if (evaluator.isAlwaysFalse()) {
-      _currentDocId = Constants.EOF;
-      setStartDocId(Constants.EOF);
-      setEndDocId(Constants.EOF);
-    } else {
-      setStartDocId(blockMetadata.getStartDocId());
-      setEndDocId(blockMetadata.getEndDocId());
-    }
+  private int _nextDocId = 0;
+  private long _numEntriesScanned = 0L;
 
-    if (evaluator.isDictionaryBased()) {
-      _valueMatcher = new IntMatcher(); // Match using dictionary id's that are integers.
-    } else {
-      _valueMatcher = getValueMatcherForType(blockMetadata.getDataType());
-    }
-    _valueMatcher.setEvaluator(evaluator);
-  }
-
-  /**
-   * After setting the startDocId, next calls will always return from &gt;=startDocId
-   *
-   * @param startDocId Start doc id
-   */
-  public void setStartDocId(int startDocId) {
-    _currentDocId = startDocId - 1;
-    _valueIterator.skipTo(startDocId);
-    _startDocId = startDocId;
-  }
-
-  /**
-   * After setting the endDocId, next call will return Constants.EOF after currentDocId exceeds
-   * endDocId
-   *
-   * @param endDocId End doc id
-   */
-  public void setEndDocId(int endDocId) {
-    _endDocId = endDocId;
-  }
-
-  @Override
-  public boolean isMatch(int docId) {
-    if (_currentDocId == Constants.EOF) {
-      return false;
-    }
-    _valueIterator.skipTo(docId);
-    _numEntriesScanned++;
-    return _valueMatcher.doesCurrentEntryMatch(_valueIterator);
-  }
-
-  @Override
-  public int advance(int targetDocId) {
-    if (_currentDocId == Constants.EOF) {
-      return _currentDocId;
-    }
-    if (targetDocId < _startDocId) {
-      targetDocId = _startDocId;
-    } else if (targetDocId > _endDocId) {
-      _currentDocId = Constants.EOF;
-    }
-    if (_currentDocId >= targetDocId) {
-      return _currentDocId;
-    } else {
-      _currentDocId = targetDocId - 1;
-      _valueIterator.skipTo(targetDocId);
-      return next();
-    }
+  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, BlockSingleValIterator valueIterator, int numDocs) {
+    _predicateEvaluator = predicateEvaluator;
+    _valueIterator = valueIterator;
+    _numDocs = numDocs;
+    _valueMatcher = getValueMatcher();
   }
 
   @Override
   public int next() {
-    if (_currentDocId == Constants.EOF) {
-      return Constants.EOF;
-    }
-    while (_valueIterator.hasNext() && _currentDocId < _endDocId) {
-      _currentDocId = _currentDocId + 1;
+    while (_nextDocId < _numDocs) {
+      int nextDocId = _nextDocId++;
       _numEntriesScanned++;
-      if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-        return _currentDocId;
+      if (_valueMatcher.doesNextValueMatch()) {
+        return nextDocId;
       }
     }
-    _currentDocId = Constants.EOF;
     return Constants.EOF;
   }
 
   @Override
-  public int currentDocId() {
-    return _currentDocId;
-  }
-
-  @Override
-  public String toString() {
-    return SVScanDocIdIterator.class.getSimpleName() + "[" + _operatorName + "]";
+  public int advance(int targetDocId) {
+    _nextDocId = targetDocId;
+    _valueIterator.skipTo(targetDocId);
+    return next();
   }
 
   @Override
   public MutableRoaringBitmap applyAnd(ImmutableRoaringBitmap docIds) {
     MutableRoaringBitmap result = new MutableRoaringBitmap();
-    if (_evaluator.isAlwaysFalse()) {
-      return result;
-    }
-    IntIterator intIterator = docIds.getIntIterator();
-    int docId = -1;
-    while (intIterator.hasNext() && docId < _endDocId) {
-      docId = intIterator.next();
-      if (docId >= _startDocId) {
-        _valueIterator.skipTo(docId);
-        _numEntriesScanned++;
-        if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-          result.add(docId);
-        }
+    IntIterator docIdIterator = docIds.getIntIterator();
+    int nextDocId;
+    while (docIdIterator.hasNext() && (nextDocId = docIdIterator.next()) < _numDocs) {

Review comment:
       its ok to drop hasNext if it simplifies the implementation




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r430783521



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdSet.java
##########
@@ -21,6 +21,4 @@
 public interface BlockDocIdSet {
 
   BlockDocIdIterator iterator();

Review comment:
       Java doc over this method and/or the interface? Since we are refactoring this interface, it is a good time to add javadoc for a public interface as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432060103



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.docidsets;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.pinot.common.utils.Pairs.IntPair;
+import org.apache.pinot.core.common.BlockDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.AndDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.BitmapBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.RangelessBitmapDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.ScanBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.SortedDocIdIterator;
+import org.apache.pinot.core.util.SortedRangeIntersection;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * The FilterBlockDocIdSet to perform AND on all child FilterBlockDocIdSets.
+ * <p>The AndBlockDocIdSet will construct the BlockDocIdIterator based on the BlockDocIdIterators from the child
+ * FilterBlockDocIdSets:
+ * <ul>
+ *   <li>
+ *     When there are at least one index-base BlockDocIdIterators (SortedDocIdIterator or BitmapBasedDocIdIterator) and

Review comment:
       It uses a bitmap to perform AND for all index-based BlockDocIdIterators, and use ScanBasedDocIdIterator.applyAnd(docIds) to resolve the scan-based BlockDocIdIterators. After all, there will be one RangelessBitmapDocIdIterator on top of the result bitmap.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432323266



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/ScanBasedDocIdIterator.java
##########
@@ -24,22 +24,18 @@
 
 
 /**
- * All scan based filter iterators must implement this interface. This allows intersection to be
- * optimized.
- * For example, if the we have two iterators one index based and another scan based, instead of

Review comment:
       This is good piece of information. Why delete it? That's how the filtering will work right if we do 
   WHERE col1 = 200 AND col2 = 10 -- if there is an inverted index on col1 and no index on col2

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java
##########
@@ -18,238 +18,138 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import org.apache.pinot.core.common.BlockMetadata;
 import org.apache.pinot.core.common.BlockSingleValIterator;
-import org.apache.pinot.core.common.BlockValSet;
 import org.apache.pinot.core.common.Constants;
 import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
-import org.apache.pinot.spi.data.FieldSpec;
 import org.roaringbitmap.IntIterator;
 import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
 import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
-public class SVScanDocIdIterator implements ScanBasedDocIdIterator {
-  private int _currentDocId = -1;
+public final class SVScanDocIdIterator implements ScanBasedDocIdIterator {
+  private final PredicateEvaluator _predicateEvaluator;
   private final BlockSingleValIterator _valueIterator;
-  private int _startDocId;
-  private int _endDocId;
-  private PredicateEvaluator _evaluator;
-  private String _operatorName;
-  private int _numEntriesScanned = 0;
+  private final int _numDocs;
   private final ValueMatcher _valueMatcher;
 
-  public SVScanDocIdIterator(String operatorName, BlockValSet blockValSet, BlockMetadata blockMetadata,
-      PredicateEvaluator evaluator) {
-    _operatorName = operatorName;
-    _evaluator = evaluator;
-    _valueIterator = (BlockSingleValIterator) blockValSet.iterator();
-
-    if (evaluator.isAlwaysFalse()) {
-      _currentDocId = Constants.EOF;
-      setStartDocId(Constants.EOF);
-      setEndDocId(Constants.EOF);
-    } else {
-      setStartDocId(blockMetadata.getStartDocId());
-      setEndDocId(blockMetadata.getEndDocId());
-    }
+  private int _nextDocId = 0;
+  private long _numEntriesScanned = 0L;
 
-    if (evaluator.isDictionaryBased()) {
-      _valueMatcher = new IntMatcher(); // Match using dictionary id's that are integers.
-    } else {
-      _valueMatcher = getValueMatcherForType(blockMetadata.getDataType());
-    }
-    _valueMatcher.setEvaluator(evaluator);
-  }
-
-  /**
-   * After setting the startDocId, next calls will always return from &gt;=startDocId
-   *
-   * @param startDocId Start doc id
-   */
-  public void setStartDocId(int startDocId) {
-    _currentDocId = startDocId - 1;
-    _valueIterator.skipTo(startDocId);
-    _startDocId = startDocId;
-  }
-
-  /**
-   * After setting the endDocId, next call will return Constants.EOF after currentDocId exceeds
-   * endDocId
-   *
-   * @param endDocId End doc id
-   */
-  public void setEndDocId(int endDocId) {
-    _endDocId = endDocId;
-  }
-
-  @Override
-  public boolean isMatch(int docId) {
-    if (_currentDocId == Constants.EOF) {
-      return false;
-    }
-    _valueIterator.skipTo(docId);
-    _numEntriesScanned++;
-    return _valueMatcher.doesCurrentEntryMatch(_valueIterator);
-  }
-
-  @Override
-  public int advance(int targetDocId) {
-    if (_currentDocId == Constants.EOF) {
-      return _currentDocId;
-    }
-    if (targetDocId < _startDocId) {
-      targetDocId = _startDocId;
-    } else if (targetDocId > _endDocId) {
-      _currentDocId = Constants.EOF;
-    }
-    if (_currentDocId >= targetDocId) {
-      return _currentDocId;
-    } else {
-      _currentDocId = targetDocId - 1;
-      _valueIterator.skipTo(targetDocId);
-      return next();
-    }
+  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, BlockSingleValIterator valueIterator, int numDocs) {
+    _predicateEvaluator = predicateEvaluator;
+    _valueIterator = valueIterator;
+    _numDocs = numDocs;
+    _valueMatcher = getValueMatcher();
   }
 
   @Override
   public int next() {
-    if (_currentDocId == Constants.EOF) {
-      return Constants.EOF;
-    }
-    while (_valueIterator.hasNext() && _currentDocId < _endDocId) {
-      _currentDocId = _currentDocId + 1;
+    while (_nextDocId < _numDocs) {
+      int nextDocId = _nextDocId++;
       _numEntriesScanned++;
-      if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-        return _currentDocId;
+      if (_valueMatcher.doesNextValueMatch()) {
+        return nextDocId;
       }
     }
-    _currentDocId = Constants.EOF;
     return Constants.EOF;
   }
 
   @Override
-  public int currentDocId() {
-    return _currentDocId;
-  }
-
-  @Override
-  public String toString() {
-    return SVScanDocIdIterator.class.getSimpleName() + "[" + _operatorName + "]";
+  public int advance(int targetDocId) {
+    _nextDocId = targetDocId;
+    _valueIterator.skipTo(targetDocId);
+    return next();
   }
 
   @Override
   public MutableRoaringBitmap applyAnd(ImmutableRoaringBitmap docIds) {
     MutableRoaringBitmap result = new MutableRoaringBitmap();
-    if (_evaluator.isAlwaysFalse()) {
-      return result;
-    }
-    IntIterator intIterator = docIds.getIntIterator();
-    int docId = -1;
-    while (intIterator.hasNext() && docId < _endDocId) {
-      docId = intIterator.next();
-      if (docId >= _startDocId) {
-        _valueIterator.skipTo(docId);
-        _numEntriesScanned++;
-        if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-          result.add(docId);
-        }
+    IntIterator docIdIterator = docIds.getIntIterator();
+    int nextDocId;
+    while (docIdIterator.hasNext() && (nextDocId = docIdIterator.next()) < _numDocs) {

Review comment:
       I think generally hasNext() should be called before next() like it is done here. But I think our next() implementations (at least in some of the cases), have hasNext() called internally from next() as well. 
   So may be we should not simply do `(nextDocId = docIdIterator.next()) < _numDocs`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
kishoreg commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r430777214



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/SingleColumnMultiValueWriter.java
##########
@@ -19,60 +19,14 @@
 package org.apache.pinot.core.io.writer;
 
 public interface SingleColumnMultiValueWriter extends DataFileWriter {
-  /**

Review comment:
       we don't want to remove these. There are usecases where we will support no-dictionary modes

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/readerwriter/impl/FixedByteSingleColumnMultiValueReaderWriter.java
##########
@@ -277,42 +260,6 @@ public void setStringArray(int row, String[] stringArray) {
     }
   }
 
-  @Override

Review comment:
       why are we removing these methods, dont we need them for no-dictionary?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockValIterator.java
##########
@@ -22,6 +22,8 @@
 
   boolean hasNext();
 
+  int getNextDocId();

Review comment:
       why is blockValIterator returning getNextDocId ?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdSet.java
##########
@@ -21,6 +21,4 @@
 public interface BlockDocIdSet {
 
   BlockDocIdIterator iterator();
-

Review comment:
       this is awesome




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r431606971



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockMultiValIterator.java
##########
@@ -18,33 +18,15 @@
  */
 package org.apache.pinot.core.common;
 
-/**
- *
- *
- */
-public abstract class BlockMultiValIterator implements BlockValIterator {
-
-  public int nextCharVal(char[] charArray) {
-    throw new UnsupportedOperationException();
-  }
+public interface BlockMultiValIterator extends BlockValIterator {
 
-  public int nextIntVal(int[] intArray) {
-    throw new UnsupportedOperationException();
-  }
+  int nextIntVal(int[] intArray);

Review comment:
       it is worthwhile to explain how to use this method? what is the input array?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-635136647


   This PR is too big to review effectively IMO. 89 files in total. Can you break it down to 3-4 smaller PRs for review: one covering method removal;  1 or 2 PR each covering a new added major classes (e.g., SortedIndexBasedFilterOperator.java).
   
   > Removed the methods that complicate the code but provide no value:
   > 
   > * BlockDocIdSet
   >   
   >   * getRaw
   > * FilterBlockDocIdSet
   >   
   >   * getMinDocId
   >   * getMaxDocId
   >   * setStartDocId
   >   * setEndDocId
   > * BlockDocIdIterator
   >   
   >   * currentDocId
   > * ScanBasedDocIdIterator
   >   
   >   * isMatch
   > 
   > Uniformed the behavior of all filter-related classes to bound the return docIds with numDocs
   > Simplified the logic of AND/OR handling
   > Pushed down the AND smart handling logic to BlockDocIdIterator to ensure the best performance:
   > 
   > * AndDocIdSet might return RangelessBitmapDocIdIterator which can be merged with other iterators
   > * OrDocIdSet might return BitmapDocIdIterator which can be merged with other iterators
   > 
   > Tested all the queries (10K PQL, 10K SQL) within the query file using HybridClusterIntegrationTest
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r433647278



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/AndDocIdIterator.java
##########
@@ -18,112 +18,52 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import java.util.Arrays;
 import org.apache.pinot.core.common.BlockDocIdIterator;
 import org.apache.pinot.core.common.Constants;
 
 
-// TODO: Optimize this
 public final class AndDocIdIterator implements BlockDocIdIterator {

Review comment:
       1.javadoc?
   2. comments about thread safety. is it safe/advisable to use this iterator in more than one thread? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432678997



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/filter/SortedIndexBasedFilterOperator.java
##########
@@ -0,0 +1,145 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.filter;
+
+import com.google.common.base.Preconditions;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.pinot.common.utils.Pairs.IntPair;
+import org.apache.pinot.core.common.DataSource;
+import org.apache.pinot.core.io.reader.impl.v1.SortedIndexReader;
+import org.apache.pinot.core.operator.blocks.FilterBlock;
+import org.apache.pinot.core.operator.docidsets.SortedDocIdSet;
+import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
+import org.apache.pinot.core.operator.filter.predicate.RangePredicateEvaluatorFactory.OfflineDictionaryBasedRangePredicateEvaluator;
+
+
+@SuppressWarnings("rawtypes")
+public class SortedIndexBasedFilterOperator extends BaseFilterOperator {
+  private static final String OPERATOR_NAME = "SortedIndexBasedFilterOperator";
+
+  private final PredicateEvaluator _predicateEvaluator;
+  private final SortedIndexReader _sortedIndexReader;
+  private final int _numDocs;
+
+  SortedIndexBasedFilterOperator(PredicateEvaluator predicateEvaluator, DataSource dataSource, int numDocs) {
+    _predicateEvaluator = predicateEvaluator;
+    _sortedIndexReader = (SortedIndexReader) dataSource.getInvertedIndex();
+    _numDocs = numDocs;
+  }
+
+  @Override
+  protected FilterBlock getNextBlock() {
+    // At this point, we need to create a list of matching docId ranges. There are two kinds of operators:
+    //
+    // - "Additive" operators, such as EQ, IN and RANGE build up a list of ranges and merge overlapping/adjacent ones,
+    //   clipping the ranges to [startDocId; endDocId]
+    //
+    // - "Subtractive" operators, such as NEQ and NOT IN build up a list of ranges that do not match and build a list of
+    //   matching intervals by subtracting a list of non-matching intervals from the given range of
+    //   [startDocId; endDocId]
+    //
+    // For now, we don't look at the cardinality of the column's dictionary, although we should do that if someone

Review comment:
       I didn't change this block of comments, but seems it identifies it as a new class. Let me update it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-635539613


   > This PR is too big to review effectively IMO. 89 files in total. Can you break it down to 3-4 smaller PRs for review: one covering method removal; 1 or 2 PR each covering a new added major classes (e.g., SortedIndexBasedFilterOperator.java).
   
   @chenboat There is no new added major classes. Most of the changes are making the classes compatible with the filter interface change, so it is very hard to break them into multiple PRs. I can make the removed interface in BlockValIterator a separate PR as they are independent of the filtering.
   This PR is the first step of re-structuring the filtering in Pinot, so for historical reason the name for some interfaces (e.g. BlockDocIdSet) might be confusing. I wouldn't spend too much time renaming and documenting them because they are going to be re-structured in the following steps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432321846



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/filter/SortedIndexBasedFilterOperator.java
##########
@@ -0,0 +1,145 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.filter;
+
+import com.google.common.base.Preconditions;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.pinot.common.utils.Pairs.IntPair;
+import org.apache.pinot.core.common.DataSource;
+import org.apache.pinot.core.io.reader.impl.v1.SortedIndexReader;
+import org.apache.pinot.core.operator.blocks.FilterBlock;
+import org.apache.pinot.core.operator.docidsets.SortedDocIdSet;
+import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
+import org.apache.pinot.core.operator.filter.predicate.RangePredicateEvaluatorFactory.OfflineDictionaryBasedRangePredicateEvaluator;
+
+
+@SuppressWarnings("rawtypes")
+public class SortedIndexBasedFilterOperator extends BaseFilterOperator {
+  private static final String OPERATOR_NAME = "SortedIndexBasedFilterOperator";
+
+  private final PredicateEvaluator _predicateEvaluator;
+  private final SortedIndexReader _sortedIndexReader;
+  private final int _numDocs;
+
+  SortedIndexBasedFilterOperator(PredicateEvaluator predicateEvaluator, DataSource dataSource, int numDocs) {
+    _predicateEvaluator = predicateEvaluator;
+    _sortedIndexReader = (SortedIndexReader) dataSource.getInvertedIndex();
+    _numDocs = numDocs;
+  }
+
+  @Override
+  protected FilterBlock getNextBlock() {
+    // At this point, we need to create a list of matching docId ranges. There are two kinds of operators:
+    //
+    // - "Additive" operators, such as EQ, IN and RANGE build up a list of ranges and merge overlapping/adjacent ones,
+    //   clipping the ranges to [startDocId; endDocId]
+    //
+    // - "Subtractive" operators, such as NEQ and NOT IN build up a list of ranges that do not match and build a list of
+    //   matching intervals by subtracting a list of non-matching intervals from the given range of
+    //   [startDocId; endDocId]
+    //
+    // For now, we don't look at the cardinality of the column's dictionary, although we should do that if someone

Review comment:
       (nit): should just be cardinality of the column (since size of dictionary is equal to cardinality)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r431604976



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockSingleValIterator.java
##########
@@ -18,37 +18,17 @@
  */
 package org.apache.pinot.core.common;
 
-/**
- *
- * TODO: Split into two classes, one for iterator over data, another over dictionary id's.
- */
-public abstract class BlockSingleValIterator implements BlockValIterator {
-
-  char nextCharVal() {
-    throw new UnsupportedOperationException();
-  }
+public interface BlockSingleValIterator extends BlockValIterator {

Review comment:
       javadoc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r430789063



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/readerwriter/impl/FixedByteSingleColumnMultiValueReaderWriter.java
##########
@@ -277,42 +260,6 @@ public void setStringArray(int row, String[] stringArray) {
     }
   }
 
-  @Override

Review comment:
       No, I only removed the ones that does not apply to Pinot types (we don't support MV BYTES).

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/io/writer/SingleColumnMultiValueWriter.java
##########
@@ -19,60 +19,14 @@
 package org.apache.pinot.core.io.writer;
 
 public interface SingleColumnMultiValueWriter extends DataFileWriter {
-  /**

Review comment:
       Same here. I only removed the ones that does not apply to Pinot data types.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockValIterator.java
##########
@@ -22,6 +22,8 @@
 
   boolean hasNext();
 
+  int getNextDocId();

Review comment:
       This iterator is not only iterator but supports `skipTo(docId)` and `reset()`. It is easier to track the docId here comparing to tracking it on the caller side. I'll add some javadoc to explain this

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdSet.java
##########
@@ -21,6 +21,4 @@
 public interface BlockDocIdSet {
 
   BlockDocIdIterator iterator();

Review comment:
       Good point, will add javadoc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r431604484



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockMultiValIterator.java
##########
@@ -18,33 +18,15 @@
  */
 package org.apache.pinot.core.common;
 
-/**
- *
- *
- */
-public abstract class BlockMultiValIterator implements BlockValIterator {
-
-  public int nextCharVal(char[] charArray) {
-    throw new UnsupportedOperationException();
-  }
+public interface BlockMultiValIterator extends BlockValIterator {

Review comment:
       javadoc?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r433641032



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/AndDocIdIterator.java
##########
@@ -18,112 +18,52 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import java.util.Arrays;
 import org.apache.pinot.core.common.BlockDocIdIterator;
 import org.apache.pinot.core.common.Constants;
 
 
-// TODO: Optimize this
 public final class AndDocIdIterator implements BlockDocIdIterator {
-  public final BlockDocIdIterator[] docIdIterators;
-  public ScanBasedDocIdIterator[] scanBasedDocIdIterators;
-  public final int[] docIdPointers;
-  public boolean reachedEnd = false;
-  public int currentDocId = -1;
-  int currentMax = -1;
-  private boolean hasScanBasedIterators;
+  public final BlockDocIdIterator[] _docIdIterators;
 
-  public AndDocIdIterator(BlockDocIdIterator[] blockDocIdIterators) {
-    int numIndexBasedIterators = 0;
-    int numScanBasedIterators = 0;
-    for (int i = 0; i < blockDocIdIterators.length; i++) {
-      if (blockDocIdIterators[i] instanceof IndexBasedDocIdIterator) {
-        numIndexBasedIterators = numIndexBasedIterators + 1;
-      } else if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-        numScanBasedIterators = numScanBasedIterators + 1;
-      }
-    }
-    // if we have at least one index based then do intersection based on index based only, and then
-    // check if matching docs apply on scan based iterator
-    if (numIndexBasedIterators > 0 && numScanBasedIterators > 0) {
-      hasScanBasedIterators = true;
-      int nonScanIteratorsSize = blockDocIdIterators.length - numScanBasedIterators;
-      this.docIdIterators = new BlockDocIdIterator[nonScanIteratorsSize];
-      this.scanBasedDocIdIterators = new ScanBasedDocIdIterator[numScanBasedIterators];
-      int nonScanBasedIndex = 0;
-      int scanBasedIndex = 0;
-      for (int i = 0; i < blockDocIdIterators.length; i++) {
-        if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-          this.scanBasedDocIdIterators[scanBasedIndex++] = (ScanBasedDocIdIterator) blockDocIdIterators[i];
-        } else {
-          this.docIdIterators[nonScanBasedIndex++] = blockDocIdIterators[i];
-        }
-      }
-    } else {
-      hasScanBasedIterators = false;
-      this.docIdIterators = blockDocIdIterators;
-    }
-    this.docIdPointers = new int[docIdIterators.length];
-    Arrays.fill(docIdPointers, -1);
-  }
+  private int _nextDocId = 0;
 
-  @Override
-  public int advance(int targetDocId) {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    if (currentDocId >= targetDocId) {
-      return currentDocId;
-    }
-    // next() method will always increment currentMax by 1.
-    currentMax = targetDocId - 1;
-    return next();
+  public AndDocIdIterator(BlockDocIdIterator[] docIdIterators) {
+    _docIdIterators = docIdIterators;
   }
 
   @Override
   public int next() {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    currentMax = currentMax + 1;
-    // always increment the pointer to current max, when this is called first time, every one will
-    // be set to start of posting list.
-    for (int i = 0; i < docIdIterators.length; i++) {
-      docIdPointers[i] = docIdIterators[i].advance(currentMax);
-      if (docIdPointers[i] == Constants.EOF) {
-        reachedEnd = true;
-        currentMax = Constants.EOF;
-        break;
-      }
-      if (docIdPointers[i] > currentMax) {
-        currentMax = docIdPointers[i];
-        if (i > 0) {
-          // we need to advance all pointer since we found a new max
-          i = -1;
-        }
+    int maxDocId = _nextDocId;

Review comment:
       i think a better name could be candidateDocId because the while loop() is searching for the next doc Id exists in all iterators. maxDocId to many ppl mean the maximum doc id in the iterator.  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432043014



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdSet.java
##########
@@ -21,6 +21,4 @@
 public interface BlockDocIdSet {
 
   BlockDocIdIterator iterator();

Review comment:
       The result from the Operator is called Block, and BlockDocIdSet is the document ids from a Block. It is not structured very well, but it is out of the scope of this PR. We have plan to clean that up later.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-636166279






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432323335



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java
##########
@@ -18,238 +18,138 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import org.apache.pinot.core.common.BlockMetadata;
 import org.apache.pinot.core.common.BlockSingleValIterator;
-import org.apache.pinot.core.common.BlockValSet;
 import org.apache.pinot.core.common.Constants;
 import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
-import org.apache.pinot.spi.data.FieldSpec;
 import org.roaringbitmap.IntIterator;
 import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
 import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
-public class SVScanDocIdIterator implements ScanBasedDocIdIterator {
-  private int _currentDocId = -1;
+public final class SVScanDocIdIterator implements ScanBasedDocIdIterator {
+  private final PredicateEvaluator _predicateEvaluator;
   private final BlockSingleValIterator _valueIterator;
-  private int _startDocId;
-  private int _endDocId;
-  private PredicateEvaluator _evaluator;
-  private String _operatorName;
-  private int _numEntriesScanned = 0;
+  private final int _numDocs;
   private final ValueMatcher _valueMatcher;
 
-  public SVScanDocIdIterator(String operatorName, BlockValSet blockValSet, BlockMetadata blockMetadata,
-      PredicateEvaluator evaluator) {
-    _operatorName = operatorName;
-    _evaluator = evaluator;
-    _valueIterator = (BlockSingleValIterator) blockValSet.iterator();
-
-    if (evaluator.isAlwaysFalse()) {
-      _currentDocId = Constants.EOF;
-      setStartDocId(Constants.EOF);
-      setEndDocId(Constants.EOF);
-    } else {
-      setStartDocId(blockMetadata.getStartDocId());
-      setEndDocId(blockMetadata.getEndDocId());
-    }
+  private int _nextDocId = 0;
+  private long _numEntriesScanned = 0L;
 
-    if (evaluator.isDictionaryBased()) {
-      _valueMatcher = new IntMatcher(); // Match using dictionary id's that are integers.
-    } else {
-      _valueMatcher = getValueMatcherForType(blockMetadata.getDataType());
-    }
-    _valueMatcher.setEvaluator(evaluator);
-  }
-
-  /**
-   * After setting the startDocId, next calls will always return from &gt;=startDocId
-   *
-   * @param startDocId Start doc id
-   */
-  public void setStartDocId(int startDocId) {
-    _currentDocId = startDocId - 1;
-    _valueIterator.skipTo(startDocId);
-    _startDocId = startDocId;
-  }
-
-  /**
-   * After setting the endDocId, next call will return Constants.EOF after currentDocId exceeds
-   * endDocId
-   *
-   * @param endDocId End doc id
-   */
-  public void setEndDocId(int endDocId) {
-    _endDocId = endDocId;
-  }
-
-  @Override
-  public boolean isMatch(int docId) {
-    if (_currentDocId == Constants.EOF) {
-      return false;
-    }
-    _valueIterator.skipTo(docId);
-    _numEntriesScanned++;
-    return _valueMatcher.doesCurrentEntryMatch(_valueIterator);
-  }
-
-  @Override
-  public int advance(int targetDocId) {
-    if (_currentDocId == Constants.EOF) {
-      return _currentDocId;
-    }
-    if (targetDocId < _startDocId) {
-      targetDocId = _startDocId;
-    } else if (targetDocId > _endDocId) {
-      _currentDocId = Constants.EOF;
-    }
-    if (_currentDocId >= targetDocId) {
-      return _currentDocId;
-    } else {
-      _currentDocId = targetDocId - 1;
-      _valueIterator.skipTo(targetDocId);
-      return next();
-    }
+  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, BlockSingleValIterator valueIterator, int numDocs) {
+    _predicateEvaluator = predicateEvaluator;
+    _valueIterator = valueIterator;
+    _numDocs = numDocs;
+    _valueMatcher = getValueMatcher();
   }
 
   @Override
   public int next() {
-    if (_currentDocId == Constants.EOF) {
-      return Constants.EOF;
-    }
-    while (_valueIterator.hasNext() && _currentDocId < _endDocId) {
-      _currentDocId = _currentDocId + 1;
+    while (_nextDocId < _numDocs) {
+      int nextDocId = _nextDocId++;
       _numEntriesScanned++;
-      if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-        return _currentDocId;
+      if (_valueMatcher.doesNextValueMatch()) {
+        return nextDocId;
       }
     }
-    _currentDocId = Constants.EOF;
     return Constants.EOF;
   }
 
   @Override
-  public int currentDocId() {
-    return _currentDocId;
-  }
-
-  @Override
-  public String toString() {
-    return SVScanDocIdIterator.class.getSimpleName() + "[" + _operatorName + "]";
+  public int advance(int targetDocId) {
+    _nextDocId = targetDocId;
+    _valueIterator.skipTo(targetDocId);
+    return next();
   }
 
   @Override
   public MutableRoaringBitmap applyAnd(ImmutableRoaringBitmap docIds) {
     MutableRoaringBitmap result = new MutableRoaringBitmap();
-    if (_evaluator.isAlwaysFalse()) {
-      return result;
-    }
-    IntIterator intIterator = docIds.getIntIterator();
-    int docId = -1;
-    while (intIterator.hasNext() && docId < _endDocId) {
-      docId = intIterator.next();
-      if (docId >= _startDocId) {
-        _valueIterator.skipTo(docId);
-        _numEntriesScanned++;
-        if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-          result.add(docId);
-        }
+    IntIterator docIdIterator = docIds.getIntIterator();
+    int nextDocId;
+    while (docIdIterator.hasNext() && (nextDocId = docIdIterator.next()) < _numDocs) {

Review comment:
       I think generally hasNext() should be called before next() like it is done here. But I think our next() implementations (at least in some of the cases), have hasNext() called internally from next() as well. 
   So may be we should simply do `(nextDocId = docIdIterator.next()) < _numDocs`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432681482



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/filter/BitmapBasedFilterOperator.java
##########
@@ -25,62 +25,73 @@
 import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
 import org.apache.pinot.core.segment.index.readers.InvertedIndexReader;
 import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
+@SuppressWarnings("rawtypes")
 public class BitmapBasedFilterOperator extends BaseFilterOperator {
   private static final String OPERATOR_NAME = "BitmapBasedFilterOperator";
 
   private final PredicateEvaluator _predicateEvaluator;
-  private final DataSource _dataSource;
-  private final ImmutableRoaringBitmap[] _bitmaps;
-  private final int _startDocId;
-  // TODO: change it to exclusive
-  // Inclusive
-  private final int _endDocId;
+  private final InvertedIndexReader _invertedIndexReader;
+  private final ImmutableRoaringBitmap _docIds;
   private final boolean _exclusive;
+  private final int _numDocs;
 
-  BitmapBasedFilterOperator(PredicateEvaluator predicateEvaluator, DataSource dataSource, int startDocId,
-      int endDocId) {
-    // NOTE:
-    // Predicate that is always evaluated as true or false should not be passed into the BitmapBasedFilterOperator for
-    // performance concern.
-    // If predicate is always evaluated as true, use MatchAllFilterOperator; if predicate is always evaluated as false,
-    // use EmptyFilterOperator.
-    Preconditions.checkArgument(!predicateEvaluator.isAlwaysTrue() && !predicateEvaluator.isAlwaysFalse());
-
+  BitmapBasedFilterOperator(PredicateEvaluator predicateEvaluator, DataSource dataSource, int numDocs) {
     _predicateEvaluator = predicateEvaluator;
-    _dataSource = dataSource;
-    _bitmaps = null;
-    _startDocId = startDocId;
-    _endDocId = endDocId;
+    _invertedIndexReader = dataSource.getInvertedIndex();
+    _docIds = null;
     _exclusive = predicateEvaluator.isExclusive();
+    _numDocs = numDocs;
   }
 
-  public BitmapBasedFilterOperator(ImmutableRoaringBitmap[] bitmaps, int startDocId, int endDocId, boolean exclusive) {
+  public BitmapBasedFilterOperator(ImmutableRoaringBitmap docIds, boolean exclusive, int numDocs) {
     _predicateEvaluator = null;
-    _dataSource = null;
-    _bitmaps = bitmaps;
-    _startDocId = startDocId;
-    _endDocId = endDocId;
+    _invertedIndexReader = null;
+    _docIds = docIds;
     _exclusive = exclusive;
+    _numDocs = numDocs;
   }
 
   @Override
   protected FilterBlock getNextBlock() {
-    if (_bitmaps != null) {
-      return new FilterBlock(new BitmapDocIdSet(_bitmaps, _startDocId, _endDocId, _exclusive));
+    if (_docIds != null) {
+      if (_exclusive) {
+        return new FilterBlock(new BitmapDocIdSet(ImmutableRoaringBitmap.flip(_docIds, 0L, _numDocs), _numDocs));
+      } else {
+        return new FilterBlock(new BitmapDocIdSet(_docIds, _numDocs));
+      }
     }
 
     int[] dictIds = _exclusive ? _predicateEvaluator.getNonMatchingDictIds() : _predicateEvaluator.getMatchingDictIds();

Review comment:
       For exclusive predicate, we need to flip (inverse) the bitmap so that the result bitmap can reflect the matching docIds.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432211637



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/AndDocIdIterator.java
##########
@@ -18,112 +18,52 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import java.util.Arrays;
 import org.apache.pinot.core.common.BlockDocIdIterator;
 import org.apache.pinot.core.common.Constants;
 
 
-// TODO: Optimize this
 public final class AndDocIdIterator implements BlockDocIdIterator {
-  public final BlockDocIdIterator[] docIdIterators;
-  public ScanBasedDocIdIterator[] scanBasedDocIdIterators;
-  public final int[] docIdPointers;
-  public boolean reachedEnd = false;
-  public int currentDocId = -1;
-  int currentMax = -1;
-  private boolean hasScanBasedIterators;
+  public final BlockDocIdIterator[] _docIdIterators;
 
-  public AndDocIdIterator(BlockDocIdIterator[] blockDocIdIterators) {
-    int numIndexBasedIterators = 0;
-    int numScanBasedIterators = 0;
-    for (int i = 0; i < blockDocIdIterators.length; i++) {
-      if (blockDocIdIterators[i] instanceof IndexBasedDocIdIterator) {
-        numIndexBasedIterators = numIndexBasedIterators + 1;
-      } else if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-        numScanBasedIterators = numScanBasedIterators + 1;
-      }
-    }
-    // if we have at least one index based then do intersection based on index based only, and then
-    // check if matching docs apply on scan based iterator
-    if (numIndexBasedIterators > 0 && numScanBasedIterators > 0) {
-      hasScanBasedIterators = true;
-      int nonScanIteratorsSize = blockDocIdIterators.length - numScanBasedIterators;
-      this.docIdIterators = new BlockDocIdIterator[nonScanIteratorsSize];
-      this.scanBasedDocIdIterators = new ScanBasedDocIdIterator[numScanBasedIterators];
-      int nonScanBasedIndex = 0;
-      int scanBasedIndex = 0;
-      for (int i = 0; i < blockDocIdIterators.length; i++) {
-        if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-          this.scanBasedDocIdIterators[scanBasedIndex++] = (ScanBasedDocIdIterator) blockDocIdIterators[i];
-        } else {
-          this.docIdIterators[nonScanBasedIndex++] = blockDocIdIterators[i];
-        }
-      }
-    } else {
-      hasScanBasedIterators = false;
-      this.docIdIterators = blockDocIdIterators;
-    }
-    this.docIdPointers = new int[docIdIterators.length];
-    Arrays.fill(docIdPointers, -1);
-  }
+  private int _nextDocId = 0;
 
-  @Override
-  public int advance(int targetDocId) {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    if (currentDocId >= targetDocId) {
-      return currentDocId;
-    }
-    // next() method will always increment currentMax by 1.
-    currentMax = targetDocId - 1;
-    return next();
+  public AndDocIdIterator(BlockDocIdIterator[] docIdIterators) {
+    _docIdIterators = docIdIterators;
   }
 
   @Override
   public int next() {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    currentMax = currentMax + 1;
-    // always increment the pointer to current max, when this is called first time, every one will
-    // be set to start of posting list.
-    for (int i = 0; i < docIdIterators.length; i++) {
-      docIdPointers[i] = docIdIterators[i].advance(currentMax);
-      if (docIdPointers[i] == Constants.EOF) {
-        reachedEnd = true;
-        currentMax = Constants.EOF;
-        break;
-      }
-      if (docIdPointers[i] > currentMax) {
-        currentMax = docIdPointers[i];
-        if (i > 0) {
-          // we need to advance all pointer since we found a new max
-          i = -1;
-        }
-      }
-      if (hasScanBasedIterators && i == docIdIterators.length - 1) {
-        // this means we found the docId common to all nonScanBased iterators, now we need to ensure
-        // that its also found in scanBasedIterator, if not matched, we restart the intersection
-        for (ScanBasedDocIdIterator iterator : scanBasedDocIdIterators) {
-          if (!iterator.isMatch(currentMax)) {
-            i = -1;
-            currentMax = currentMax + 1;
-            break;
+    int maxDocId = _nextDocId;
+    int maxDocIdIndex = -1;
+    int numDocIdIterators = _docIdIterators.length;
+    int index = 0;
+    while (index < numDocIdIterators) {
+      if (index == maxDocIdIndex) {
+        // Skip the index with the max document id
+        index++;

Review comment:
       It will be useful to add some comments here as to how we make sure that `index` never overruns. It took me a couple of iterations to get the test cases to test this.
   
   On that topic, will be useful to add a unit test for this class.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r434064515



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/AndDocIdIterator.java
##########
@@ -18,112 +18,52 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import java.util.Arrays;
 import org.apache.pinot.core.common.BlockDocIdIterator;
 import org.apache.pinot.core.common.Constants;
 
 
-// TODO: Optimize this
 public final class AndDocIdIterator implements BlockDocIdIterator {
-  public final BlockDocIdIterator[] docIdIterators;
-  public ScanBasedDocIdIterator[] scanBasedDocIdIterators;
-  public final int[] docIdPointers;
-  public boolean reachedEnd = false;
-  public int currentDocId = -1;
-  int currentMax = -1;
-  private boolean hasScanBasedIterators;
+  public final BlockDocIdIterator[] _docIdIterators;
 
-  public AndDocIdIterator(BlockDocIdIterator[] blockDocIdIterators) {
-    int numIndexBasedIterators = 0;
-    int numScanBasedIterators = 0;
-    for (int i = 0; i < blockDocIdIterators.length; i++) {
-      if (blockDocIdIterators[i] instanceof IndexBasedDocIdIterator) {
-        numIndexBasedIterators = numIndexBasedIterators + 1;
-      } else if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-        numScanBasedIterators = numScanBasedIterators + 1;
-      }
-    }
-    // if we have at least one index based then do intersection based on index based only, and then
-    // check if matching docs apply on scan based iterator
-    if (numIndexBasedIterators > 0 && numScanBasedIterators > 0) {
-      hasScanBasedIterators = true;
-      int nonScanIteratorsSize = blockDocIdIterators.length - numScanBasedIterators;
-      this.docIdIterators = new BlockDocIdIterator[nonScanIteratorsSize];
-      this.scanBasedDocIdIterators = new ScanBasedDocIdIterator[numScanBasedIterators];
-      int nonScanBasedIndex = 0;
-      int scanBasedIndex = 0;
-      for (int i = 0; i < blockDocIdIterators.length; i++) {
-        if (blockDocIdIterators[i] instanceof ScanBasedDocIdIterator) {
-          this.scanBasedDocIdIterators[scanBasedIndex++] = (ScanBasedDocIdIterator) blockDocIdIterators[i];
-        } else {
-          this.docIdIterators[nonScanBasedIndex++] = blockDocIdIterators[i];
-        }
-      }
-    } else {
-      hasScanBasedIterators = false;
-      this.docIdIterators = blockDocIdIterators;
-    }
-    this.docIdPointers = new int[docIdIterators.length];
-    Arrays.fill(docIdPointers, -1);
-  }
+  private int _nextDocId = 0;
 
-  @Override
-  public int advance(int targetDocId) {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    if (currentDocId >= targetDocId) {
-      return currentDocId;
-    }
-    // next() method will always increment currentMax by 1.
-    currentMax = targetDocId - 1;
-    return next();
+  public AndDocIdIterator(BlockDocIdIterator[] docIdIterators) {
+    _docIdIterators = docIdIterators;
   }
 
   @Override
   public int next() {
-    if (currentDocId == Constants.EOF) {
-      return currentDocId;
-    }
-    currentMax = currentMax + 1;
-    // always increment the pointer to current max, when this is called first time, every one will
-    // be set to start of posting list.
-    for (int i = 0; i < docIdIterators.length; i++) {
-      docIdPointers[i] = docIdIterators[i].advance(currentMax);
-      if (docIdPointers[i] == Constants.EOF) {
-        reachedEnd = true;
-        currentMax = Constants.EOF;
-        break;
-      }
-      if (docIdPointers[i] > currentMax) {
-        currentMax = docIdPointers[i];
-        if (i > 0) {
-          // we need to advance all pointer since we found a new max
-          i = -1;
-        }
+    int maxDocId = _nextDocId;

Review comment:
       I prefer `AndDocIdIterator` because it performs an AND (intersection) operation on all the iterators, and also it is the `BlockDocIdIterator` for `AndDocIdSet`. Will add some javadoc showing the relation between them.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432695654



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/SVScanDocIdIterator.java
##########
@@ -18,238 +18,138 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import org.apache.pinot.core.common.BlockMetadata;
 import org.apache.pinot.core.common.BlockSingleValIterator;
-import org.apache.pinot.core.common.BlockValSet;
 import org.apache.pinot.core.common.Constants;
 import org.apache.pinot.core.operator.filter.predicate.PredicateEvaluator;
-import org.apache.pinot.spi.data.FieldSpec;
 import org.roaringbitmap.IntIterator;
 import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
 import org.roaringbitmap.buffer.MutableRoaringBitmap;
 
 
-public class SVScanDocIdIterator implements ScanBasedDocIdIterator {
-  private int _currentDocId = -1;
+public final class SVScanDocIdIterator implements ScanBasedDocIdIterator {
+  private final PredicateEvaluator _predicateEvaluator;
   private final BlockSingleValIterator _valueIterator;
-  private int _startDocId;
-  private int _endDocId;
-  private PredicateEvaluator _evaluator;
-  private String _operatorName;
-  private int _numEntriesScanned = 0;
+  private final int _numDocs;
   private final ValueMatcher _valueMatcher;
 
-  public SVScanDocIdIterator(String operatorName, BlockValSet blockValSet, BlockMetadata blockMetadata,
-      PredicateEvaluator evaluator) {
-    _operatorName = operatorName;
-    _evaluator = evaluator;
-    _valueIterator = (BlockSingleValIterator) blockValSet.iterator();
-
-    if (evaluator.isAlwaysFalse()) {
-      _currentDocId = Constants.EOF;
-      setStartDocId(Constants.EOF);
-      setEndDocId(Constants.EOF);
-    } else {
-      setStartDocId(blockMetadata.getStartDocId());
-      setEndDocId(blockMetadata.getEndDocId());
-    }
+  private int _nextDocId = 0;
+  private long _numEntriesScanned = 0L;
 
-    if (evaluator.isDictionaryBased()) {
-      _valueMatcher = new IntMatcher(); // Match using dictionary id's that are integers.
-    } else {
-      _valueMatcher = getValueMatcherForType(blockMetadata.getDataType());
-    }
-    _valueMatcher.setEvaluator(evaluator);
-  }
-
-  /**
-   * After setting the startDocId, next calls will always return from &gt;=startDocId
-   *
-   * @param startDocId Start doc id
-   */
-  public void setStartDocId(int startDocId) {
-    _currentDocId = startDocId - 1;
-    _valueIterator.skipTo(startDocId);
-    _startDocId = startDocId;
-  }
-
-  /**
-   * After setting the endDocId, next call will return Constants.EOF after currentDocId exceeds
-   * endDocId
-   *
-   * @param endDocId End doc id
-   */
-  public void setEndDocId(int endDocId) {
-    _endDocId = endDocId;
-  }
-
-  @Override
-  public boolean isMatch(int docId) {
-    if (_currentDocId == Constants.EOF) {
-      return false;
-    }
-    _valueIterator.skipTo(docId);
-    _numEntriesScanned++;
-    return _valueMatcher.doesCurrentEntryMatch(_valueIterator);
-  }
-
-  @Override
-  public int advance(int targetDocId) {
-    if (_currentDocId == Constants.EOF) {
-      return _currentDocId;
-    }
-    if (targetDocId < _startDocId) {
-      targetDocId = _startDocId;
-    } else if (targetDocId > _endDocId) {
-      _currentDocId = Constants.EOF;
-    }
-    if (_currentDocId >= targetDocId) {
-      return _currentDocId;
-    } else {
-      _currentDocId = targetDocId - 1;
-      _valueIterator.skipTo(targetDocId);
-      return next();
-    }
+  public SVScanDocIdIterator(PredicateEvaluator predicateEvaluator, BlockSingleValIterator valueIterator, int numDocs) {
+    _predicateEvaluator = predicateEvaluator;
+    _valueIterator = valueIterator;
+    _numDocs = numDocs;
+    _valueMatcher = getValueMatcher();
   }
 
   @Override
   public int next() {
-    if (_currentDocId == Constants.EOF) {
-      return Constants.EOF;
-    }
-    while (_valueIterator.hasNext() && _currentDocId < _endDocId) {
-      _currentDocId = _currentDocId + 1;
+    while (_nextDocId < _numDocs) {
+      int nextDocId = _nextDocId++;
       _numEntriesScanned++;
-      if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-        return _currentDocId;
+      if (_valueMatcher.doesNextValueMatch()) {
+        return nextDocId;
       }
     }
-    _currentDocId = Constants.EOF;
     return Constants.EOF;
   }
 
   @Override
-  public int currentDocId() {
-    return _currentDocId;
-  }
-
-  @Override
-  public String toString() {
-    return SVScanDocIdIterator.class.getSimpleName() + "[" + _operatorName + "]";
+  public int advance(int targetDocId) {
+    _nextDocId = targetDocId;
+    _valueIterator.skipTo(targetDocId);
+    return next();
   }
 
   @Override
   public MutableRoaringBitmap applyAnd(ImmutableRoaringBitmap docIds) {
     MutableRoaringBitmap result = new MutableRoaringBitmap();
-    if (_evaluator.isAlwaysFalse()) {
-      return result;
-    }
-    IntIterator intIterator = docIds.getIntIterator();
-    int docId = -1;
-    while (intIterator.hasNext() && docId < _endDocId) {
-      docId = intIterator.next();
-      if (docId >= _startDocId) {
-        _valueIterator.skipTo(docId);
-        _numEntriesScanned++;
-        if (_valueMatcher.doesCurrentEntryMatch(_valueIterator)) {
-          result.add(docId);
-        }
+    IntIterator docIdIterator = docIds.getIntIterator();
+    int nextDocId;
+    while (docIdIterator.hasNext() && (nextDocId = docIdIterator.next()) < _numDocs) {

Review comment:
       This IntIterator is from `ImmutableRoaringBitmap.getIntIterator()` where `docIdIterator.next()` does not check `hasNext()`.
   Our `BlockDocIdIterator` does not have `hasNext()` and always use `Constants.EOF` to notify the end of the iteration.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r434063565



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/ArrayBasedDocIdIterator.java
##########
@@ -36,37 +35,15 @@ public ArrayBasedDocIdIterator(int[] docIds, int searchableLength) {
 
   @Override
   public int next() {
-    if (_currentDocId == Constants.EOF) {
-      return Constants.EOF;
-    }
-    if (++_currentIndex == _searchableLength) {
-      _currentDocId = Constants.EOF;
+    if (_nextIndex < _searchableLength) {
+      return _docIds[_nextIndex++];

Review comment:
       Caller should ensure `_searchableLength <= _docIds.length`. If caller passes in wrong parameters, it is okay to through IndexOutOfBoundException. Similarly, we don't perform null check everywhere for simplicity and performance reason.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang merged pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang merged pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r431606505



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockValIterator.java
##########
@@ -20,9 +20,23 @@
 
 public interface BlockValIterator {
 
+  /**
+   * Returns {@code true} if there are more values in the iterator, {@code false} otherwise.
+   */
   boolean hasNext();
 
+  /**
+   * Returns the next document id to be read.
+   */
+  int getNextDocId();
+
+  /**
+   * Sets the next document id to the given document id.

Review comment:
       what is the behavior if the input doc id does not exist?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432700876



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdIterator.java
##########
@@ -25,25 +25,16 @@
 public interface BlockDocIdIterator {
 
   /**
-   * Get the next document id.
-   *
-   * @return Next document id or EOF if there is no more documents
+   * Returns the next matched document id, or {@link Constants#EOF} if there is no more matched document.
+   * <p>NOTE: There should be no more call to this method after it returns {@link Constants#EOF}.
    */
   int next();
 
   /**
-   * Advance to the first document whose id is equal or greater than the given target document id.
-   * <p>If the given target document id is smaller or equal to the current document id, then return the current one.
-   *
-   * @param targetDocId The target document id
-   * @return First document id that is equal or greater than target or EOF if no document matches
+   * Returns the first matched document whose id is equal to or greater than the given target document id, or
+   * {@link Constants#EOF} if there is no such document.
+   * <p>NOTE: The target document id should be greater than the document id previous returned.

Review comment:
       No, the `advance(targetDocId)` here is equivalent to `skipTo(targetDocId)` and `next()` where you already iterate over the `targetDocId`. From an iterator's perspective, it should not return the same value twice. With this assumption, we can save one if check inside the `advance(targetDocId)`.
   Updated the javadoc to make it more clear.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432331313



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.docidsets;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.pinot.common.utils.Pairs.IntPair;
+import org.apache.pinot.core.common.BlockDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.AndDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.BitmapBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.RangelessBitmapDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.ScanBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.SortedDocIdIterator;
+import org.apache.pinot.core.util.SortedRangeIntersection;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * The FilterBlockDocIdSet to perform AND on all child FilterBlockDocIdSets.
+ * <p>The AndBlockDocIdSet will construct the BlockDocIdIterator based on the BlockDocIdIterators from the child
+ * FilterBlockDocIdSets:
+ * <ul>
+ *   <li>
+ *     When there are at least one index-base BlockDocIdIterators (SortedDocIdIterator or BitmapBasedDocIdIterator) and
+ *     at least one ScanBasedDocIdIterator, or more than one index-based BlockDocIdIterators, merge them and construct a
+ *     RangelessBitmapDocIdIterator from the merged document ids. If there is no remaining BlockDocIdIterators, directly
+ *     return the merged RangelessBitmapDocIdIterator; otherwise, construct and return an AndDocIdIterator with the
+ *     merged RangelessBitmapDocIdIterator and the remaining BlockDocIdIterators.
+ *   </li>
+ *   <li>
+ *     Otherwise, construct and return an AndDocIdIterator with all BlockDocIdIterators.
+ *   </li>
+ * </ul>
+ */
+public final class AndDocIdSet implements FilterBlockDocIdSet {
+  private final List<FilterBlockDocIdSet> _docIdSets;
+
+  public AndDocIdSet(List<FilterBlockDocIdSet> docIdSets) {
+    _docIdSets = docIdSets;
+  }
+
+  @Override
+  public BlockDocIdIterator iterator() {
+    int numDocIdSets = _docIdSets.size();
+    // NOTE: Keep the order of FilterBlockDocIdSets to preserve the order decided within FilterOperatorUtils.
+    // TODO: Consider deciding the order based on the stats of BlockDocIdIterators
+    BlockDocIdIterator[] allDocIdIterators = new BlockDocIdIterator[numDocIdSets];
+    List<SortedDocIdIterator> sortedDocIdIterators = new ArrayList<>();

Review comment:
       These 3 (sorted, inv index, and scan) are basically for left or right leaf operators of AND. The `remainingDocIdIterators` is for non-leaves (child AND/OR) right? 
   
   On that note, for any sub-tree rooted at AND, there can be at-most one child with sorted iterator. Right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r432704090



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.operator.docidsets;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.pinot.common.utils.Pairs.IntPair;
+import org.apache.pinot.core.common.BlockDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.AndDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.BitmapBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.RangelessBitmapDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.ScanBasedDocIdIterator;
+import org.apache.pinot.core.operator.dociditerators.SortedDocIdIterator;
+import org.apache.pinot.core.util.SortedRangeIntersection;
+import org.roaringbitmap.buffer.ImmutableRoaringBitmap;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+/**
+ * The FilterBlockDocIdSet to perform AND on all child FilterBlockDocIdSets.
+ * <p>The AndBlockDocIdSet will construct the BlockDocIdIterator based on the BlockDocIdIterators from the child
+ * FilterBlockDocIdSets:
+ * <ul>
+ *   <li>
+ *     When there are at least one index-base BlockDocIdIterators (SortedDocIdIterator or BitmapBasedDocIdIterator) and
+ *     at least one ScanBasedDocIdIterator, or more than one index-based BlockDocIdIterators, merge them and construct a
+ *     RangelessBitmapDocIdIterator from the merged document ids. If there is no remaining BlockDocIdIterators, directly
+ *     return the merged RangelessBitmapDocIdIterator; otherwise, construct and return an AndDocIdIterator with the
+ *     merged RangelessBitmapDocIdIterator and the remaining BlockDocIdIterators.
+ *   </li>
+ *   <li>
+ *     Otherwise, construct and return an AndDocIdIterator with all BlockDocIdIterators.
+ *   </li>
+ * </ul>
+ */
+public final class AndDocIdSet implements FilterBlockDocIdSet {
+  private final List<FilterBlockDocIdSet> _docIdSets;
+
+  public AndDocIdSet(List<FilterBlockDocIdSet> docIdSets) {
+    _docIdSets = docIdSets;
+  }
+
+  @Override
+  public BlockDocIdIterator iterator() {
+    int numDocIdSets = _docIdSets.size();
+    // NOTE: Keep the order of FilterBlockDocIdSets to preserve the order decided within FilterOperatorUtils.
+    // TODO: Consider deciding the order based on the stats of BlockDocIdIterators
+    BlockDocIdIterator[] allDocIdIterators = new BlockDocIdIterator[numDocIdSets];
+    List<SortedDocIdIterator> sortedDocIdIterators = new ArrayList<>();

Review comment:
       For the current supported iterators, yes `remainingDocIdIterators` can only be AND/OR.
   
   There could be multiple sorted iterators if there are multiple predicates on the same sorted column or there are multiple sorted columns.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat edited a comment on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat edited a comment on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-635802473


   > @chenboat There is no new added major classes. Most of the changes are making the classes compatible with the filter interface change, so it is very hard to break them into multiple PRs. I can make the removed interface in BlockValIterator a separate PR as they are independent of the filtering.
   > This PR is the first step of re-structuring the filtering in Pinot, so for historical reason the name for some interfaces (e.g. BlockDocIdSet) might be confusing. I wouldn't spend too much time renaming and documenting them because they are going to be re-structured in the following steps.
   
   Thanks for reducing the number of changed files. Can you update the summary as well? I also realized some new classes are just renaming of previously existing classes -- my bad.
    
   I still think there is room for breaking up this PR. Based on your summary there are multiple things going on in this PR:
    > 1. Uniformed the behavior of all filter-related classes to bound the return docIds with numDocs
    > 2. Simplified the logic of AND/OR handling
    > 3. Pushed down ... 
   
   Can we put the 3 items in 3 PRs? I found there are non-trivial coding refactoring  (e.g., pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java) mixed with interface changes in this PR. Can we separate these into smaller PRs? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-638597901


   > @mayankshriv Did a rough performance benchmark running all the queries in the query file for 10 times (100K queries in total) in Hybrid, MultiNodesOffline and Realtime cluster integration test and the result is very close before and after this PR (both within the range of 23-24 seconds).
   > So the gain of this PR is mostly for simplicity and readability of the code as well as enhancement for certain code path (e.g. the problem addressed in #5328 won't happen again)
   
   Thank you @Jackie-Jiang for performing this benchmark, much appreciated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#issuecomment-635802473


   > @chenboat There is no new added major classes. Most of the changes are making the classes compatible with the filter interface change, so it is very hard to break them into multiple PRs. I can make the removed interface in BlockValIterator a separate PR as they are independent of the filtering.
   > This PR is the first step of re-structuring the filtering in Pinot, so for historical reason the name for some interfaces (e.g. BlockDocIdSet) might be confusing. I wouldn't spend too much time renaming and documenting them because they are going to be re-structured in the following steps.
   
   Thanks for reducing the number of changed files. Can you update the summary as well? I also realized some new classes are just renaming of previously existing classes -- my bad.
    
   I still think there is room for breaking up this PR. Based on your summary there are multiple things going on in this PR:
    > 1. Uniformed the behavior of all filter-related classes to bound the return docIds with numDocs
    > 2. Simplified the logic of AND/OR handling
    > 3. Pushed down ...  
   Can we put the 3 items in 3 PRs? I found there are non-trivial coding refactoring  (e.g., pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java) mixed with interface changes in this PR. Can we separate these into smaller PRs? 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] chenboat commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
chenboat commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r431604799



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/common/BlockDocIdSet.java
##########
@@ -21,6 +21,4 @@
 public interface BlockDocIdSet {
 
   BlockDocIdIterator iterator();

Review comment:
       can you also explain why it is called block*?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #5444: Enhance and simplify the filtering

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #5444:
URL: https://github.com/apache/incubator-pinot/pull/5444#discussion_r434066870



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/dociditerators/AndDocIdIterator.java
##########
@@ -18,112 +18,52 @@
  */
 package org.apache.pinot.core.operator.dociditerators;
 
-import java.util.Arrays;
 import org.apache.pinot.core.common.BlockDocIdIterator;
 import org.apache.pinot.core.common.Constants;
 
 
-// TODO: Optimize this
 public final class AndDocIdIterator implements BlockDocIdIterator {

Review comment:
       It is not thread safety (the whole filtering layer does not involve thread safety because it is happening under one segment which is always processed by one thread). Also, I don't think multiple threads using one iterator make sense..




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org