You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jasperjiaguo (via GitHub)" <gi...@apache.org> on 2023/07/27 22:24:59 UTC

[GitHub] [pinot] jasperjiaguo opened a new pull request, #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance

jasperjiaguo opened a new pull request, #11200:
URL: https://github.com/apache/pinot/pull/11200

   
   
   Instructions:
   1. The PR has to be tagged with at least one of the following labels (*):
      1. `feature`
      2. `bugfix`
      3. `performance`
      4. `ui`
      5. `backward-incompat`
      6. `release-notes` (**)
   2. Remove these instructions before publishing the PR.
    
   (*) Other labels to consider:
   - `testing`
   - `dependencies`
   - `docker`
   - `kubernetes`
   - `observability`
   - `security`
   - `code-style`
   - `extension-point`
   - `refactor`
   - `cleanup`
   
   (**) Use `release-notes` label for scenarios like:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #11200:
URL: https://github.com/apache/pinot/pull/11200#issuecomment-1654709066

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11200](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (97f776a) into [master](https://app.codecov.io/gh/apache/pinot/commit/2904f1bce19f93b4e6f2f6492e741a5adc02dd82?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (2904f1b) will **decrease** coverage by `0.12%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #11200      +/-   ##
   ==========================================
   - Coverage    0.11%    0.00%   -0.12%     
   ==========================================
     Files        2224     2208      -16     
     Lines      119351   118934     -417     
     Branches    18069    18014      -55     
   ==========================================
   - Hits          137        0     -137     
   + Misses     119194   118934     -260     
   + Partials       20        0      -20     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `0.00% <0.00%> (ø)` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files Changed](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/pinot/core/common/BlockDocIdSet.java](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9jb21tb24vQmxvY2tEb2NJZFNldC5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/core/operator/docidsets/AndDocIdSet.java](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9kb2NpZHNldHMvQW5kRG9jSWRTZXQuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../pinot/core/operator/docidsets/BitmapDocIdSet.java](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9kb2NpZHNldHMvQml0bWFwRG9jSWRTZXQuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/core/operator/docidsets/NotDocIdSet.java](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9kb2NpZHNldHMvTm90RG9jSWRTZXQuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/core/operator/docidsets/OrDocIdSet.java](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9kb2NpZHNldHMvT3JEb2NJZFNldC5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...re/operator/docidsets/RangelessBitmapDocIdSet.java](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9kb2NpZHNldHMvUmFuZ2VsZXNzQml0bWFwRG9jSWRTZXQuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...tion/BaseDistinctAggregateAggregationFunction.java](https://app.codecov.io/gh/apache/pinot/pull/11200?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9hZ2dyZWdhdGlvbi9mdW5jdGlvbi9CYXNlRGlzdGluY3RBZ2dyZWdhdGVBZ2dyZWdhdGlvbkZ1bmN0aW9uLmphdmE=) | `0.00% <0.00%> (ø)` | |
   
   ... and [18 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/11200/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1286251914


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctAggregateAggregationFunction.java:
##########
@@ -112,6 +113,9 @@ public Set merge(Set intermediateResult1, Set intermediateResult2) {
     if (intermediateResult2.isEmpty()) {
       return intermediateResult1;
     }
+
+    Tracing.ThreadAccountantOps.sampleAndCheckInterruption();

Review Comment:
   Good question. The complete stack trace is 
   ```
   Thread name: "brw-14", daemon: false
   java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48)
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.rehash(LongOpenHashSet.java:592)
     Local variables
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.tryCapacity(LongOpenHashSet.java:266)
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.addAll(LongOpenHashSet.java:283)
     Local variables
   org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction.merge(BaseDistinctAggregateAggregationFunction.java:115)
     Local variables
   org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction.merge(BaseDistinctAggregateAggregationFunction.java:47)
   org.apache.pinot.core.data.table.IndexedTable.lambda$addOrUpdateRecord$0(IndexedTable.java:113)
     Local variables
   org.apache.pinot.core.data.table.IndexedTable$$Lambda$1890.apply(, line not available)
   java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1932)
     Local variables
   org.apache.pinot.core.data.table.IndexedTable.addOrUpdateRecord(IndexedTable.java:105)
   org.apache.pinot.core.data.table.ConcurrentIndexedTable.upsertWithoutOrderBy(ConcurrentIndexedTable.java:77)
   ```
   So this BaseDistinctAggregateAggregationFunction.merge function is called in IndexedTable. addOrUpdateRecord in an iterative manner, therefore, we can call this once in merge and it will get updated multiple times in the outter loop. 
   
   The reason we cannot add more precise tracing is we cannot break set's addAll function and inject instrumentation code further inside it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia merged pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia merged PR #11200:
URL: https://github.com/apache/pinot/pull/11200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1278079134


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,21 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        iterator.remove();
       } else if (docIdIterator instanceof BitmapBasedDocIdIterator) {
         bitmapBasedDocIdIterators.add((BitmapBasedDocIdIterator) docIdIterator);
+        // do not keep holding on to the bitmaps since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   Yes very good catch! Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1284700792


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,22 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   Added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1284932374


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctAggregateAggregationFunction.java:
##########
@@ -112,6 +113,9 @@ public Set merge(Set intermediateResult1, Set intermediateResult2) {
     if (intermediateResult2.isEmpty()) {
       return intermediateResult1;
     }
+
+    Tracing.ThreadAccountantOps.sampleAndCheckInterruption();

Review Comment:
   I am wondering if this will actually prevent the problem we saw happening internally.
   
   IIUC - Until this point in the code, merge hasn't consumed any memory. It's the addAll() and resulting rehash etc that can blow up memory ?
   
   Here is the exception from the heap dump.
   
   
   ```
   2. Thread throwing OutOfMemoryError.  found  [What's this?](https://aka.ms/jxray/#thread_throwing_oom)
   
   This dump was created after OutOfMemoryError in the following thread:
   
   Thread name: "brw-14", daemon: false
   java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48)
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.rehash(LongOpenHashSet.java:592)
     Local variables
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.tryCapacity(LongOpenHashSet.java:266)
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.addAll(LongOpenHashSet.java:283)
     Local variables
   org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction.merge(BaseDistinctAggregateAggregationFunction.java:115)
     Local variables
   org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction.merge(BaseDistinctAggregateAggregationFunction.java:47)
   
   ````
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1281199310


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,21 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        iterator.remove();
       } else if (docIdIterator instanceof BitmapBasedDocIdIterator) {
         bitmapBasedDocIdIterators.add((BitmapBasedDocIdIterator) docIdIterator);
+        // do not keep holding on to the bitmaps since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   As mentioned above, collecting `numEntriesScannedInFilter` when creating the iterator itself is wrong. Since we know these iterators can be applied before any scan happens, we can remove this step. Some comment explaining this would be good



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1283604033


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,22 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   Sounds good. Please add some comments explaining this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1278059576


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,21 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        iterator.remove();
       } else if (docIdIterator instanceof BitmapBasedDocIdIterator) {
         bitmapBasedDocIdIterators.add((BitmapBasedDocIdIterator) docIdIterator);
+        // do not keep holding on to the bitmaps since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   it is quite confusing when you update the `getNumEntriesScannedInFilter` here but not for the `SortedDocIdIterator` case where also you remove the iterator. From the code (please double check) it looks like the `SortedDocIdSet` returns 0:
   
   ```
     @Override
     public long getNumEntriesScannedInFilter() {
       return 0L;
     }
   ```
   
   Can we also call this API for that case to ensure that if ever the implementation changes in the future this doesn't result in a bug here since we don't update for all cases where we remove?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jasperjiaguo commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "jasperjiaguo (via GitHub)" <gi...@apache.org>.
jasperjiaguo commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1286251914


##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/BaseDistinctAggregateAggregationFunction.java:
##########
@@ -112,6 +113,9 @@ public Set merge(Set intermediateResult1, Set intermediateResult2) {
     if (intermediateResult2.isEmpty()) {
       return intermediateResult1;
     }
+
+    Tracing.ThreadAccountantOps.sampleAndCheckInterruption();

Review Comment:
   Good question. The complete stack trace is 
   ```
   Thread name: "brw-14", daemon: false
   java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:48)
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.rehash(LongOpenHashSet.java:592)
     Local variables
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.tryCapacity(LongOpenHashSet.java:266)
   it.unimi.dsi.fastutil.longs.LongOpenHashSet.addAll(LongOpenHashSet.java:283)
     Local variables
   org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction.merge(BaseDistinctAggregateAggregationFunction.java:115)
     Local variables
   org.apache.pinot.core.query.aggregation.function.BaseDistinctAggregateAggregationFunction.merge(BaseDistinctAggregateAggregationFunction.java:47)
   org.apache.pinot.core.data.table.IndexedTable.lambda$addOrUpdateRecord$0(IndexedTable.java:113)
     Local variables
   org.apache.pinot.core.data.table.IndexedTable$$Lambda$1890.apply(, line not available)
   java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1932)
     Local variables
   org.apache.pinot.core.data.table.IndexedTable.addOrUpdateRecord(IndexedTable.java:105)
   org.apache.pinot.core.data.table.ConcurrentIndexedTable.upsertWithoutOrderBy(ConcurrentIndexedTable.java:77)
   ```
   This BaseDistinctAggregateAggregationFunction.merge function is called in IndexedTable. addOrUpdateRecord in an iterative manner, therefore, we can call this once in merge and it will get updated multiple times in the outter loop. 
   
   The reason we cannot add more precise tracing is we cannot break set's addAll function and inject instrumentation code further inside it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1278059576


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,21 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        iterator.remove();
       } else if (docIdIterator instanceof BitmapBasedDocIdIterator) {
         bitmapBasedDocIdIterators.add((BitmapBasedDocIdIterator) docIdIterator);
+        // do not keep holding on to the bitmaps since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   it is quite confusing when you update the `getNumEntriesScannedInFilter` here but not for the `SortedDocIdIterator` case where also you remove the iterator. From the code (please double check) it looks like the `SortedDocIdSet` returns 0:
   
   ```
     @Override
     public long getNumEntriesScannedInFilter() {
       return 0L;
     }
   ```
   
   Can we also call this API for that case to ensure that if ever the implementation changes in the future this doesn't result in a bug here since we don't update for all cases where we remove? Same applies to `OrDocIdSet`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1281197936


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,22 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   I don't think we need to collect `numEntriesScannedInFilter` here. When creating the iterator, no scan has happened yet. Same for other places



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1278061432


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,21 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {

Review Comment:
   nit: I don't see you using `i` anymore in this block. Can this be converted to a while loop instead?



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,21 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {

Review Comment:
   nit: I don't see you using `i` anymore in this block. Can this be converted to a while loop instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on a diff in pull request #11200: Add instrumentation for dcount merge. Release bitmaps for and/or doc id set to save memory. `performance`

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on code in PR #11200:
URL: https://github.com/apache/pinot/pull/11200#discussion_r1278059576


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docidsets/AndDocIdSet.java:
##########
@@ -75,13 +77,21 @@ public BlockDocIdIterator iterator() {
     List<BitmapBasedDocIdIterator> bitmapBasedDocIdIterators = new ArrayList<>();
     List<ScanBasedDocIdIterator> scanBasedDocIdIterators = new ArrayList<>();
     List<BlockDocIdIterator> remainingDocIdIterators = new ArrayList<>();
-    for (int i = 0; i < numDocIdSets; i++) {
-      BlockDocIdIterator docIdIterator = _docIdSets.get(i).iterator();
+
+    Iterator<BlockDocIdSet> iterator = _docIdSets.iterator();
+    for (int i = 0; iterator.hasNext(); i++) {
+      BlockDocIdSet blockDocIdSet = iterator.next();
+      BlockDocIdIterator docIdIterator = blockDocIdSet.iterator();
       allDocIdIterators[i] = docIdIterator;
       if (docIdIterator instanceof SortedDocIdIterator) {
         sortedDocIdIterators.add((SortedDocIdIterator) docIdIterator);
+        // do not keep holding on to the _docIdRanges since they will occupy heap space during the query execution
+        iterator.remove();
       } else if (docIdIterator instanceof BitmapBasedDocIdIterator) {
         bitmapBasedDocIdIterators.add((BitmapBasedDocIdIterator) docIdIterator);
+        // do not keep holding on to the bitmaps since they will occupy heap space during the query execution
+        _numEntriesScannedInFilter += blockDocIdSet.getNumEntriesScannedInFilter();

Review Comment:
   it is quite confusing when you update the `getNumEntriesScannedInFilter` here but not for the `SortedDocIdIterator` case where also you remove the iterator. From the code (please double check) it looks like the `SortedDocIdSet` returns 0, so not updating it today will work and is not a correctness issue as such:
   
   ```
     @Override
     public long getNumEntriesScannedInFilter() {
       return 0L;
     }
   ```
   
   Can we also call this API for that case to ensure that if ever the implementation changes in the future this doesn't result in a bug here since we don't update for all cases where we remove? Same applies to `OrDocIdSet`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org