You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/07/14 14:21:36 UTC

[GitHub] [pinot] gortiz opened a new pull request, #11112: V2 allocation optimizations

gortiz opened a new pull request, #11112:
URL: https://github.com/apache/pinot/pull/11112

   This PR applies some easy win optimizations trying to reduce the Object[] allocation in V2. Results are not as good as expected, given that they mostly attack incorrect ArrayList initializations while the actual improvement will be trying to reduce how many ArrayList are created. The latter is something I would like to achieve, but the changes required are not as simple as the ones included here.
   
   Allocations pre change:
   ![image](https://github.com/apache/pinot/assets/1913993/22e34578-5ebc-4966-a3e5-55cfca06fc65)
   
   Allocations post change:
   ![image](https://github.com/apache/pinot/assets/1913993/d2b61c4e-ff19-45aa-8dcd-a3eb6f2e94ea)
   
   Notice that:
   * The improvement is mainly in the gray areas
   * Flamegraphs show a relative performance
   * The difference in outside TLAB allocation
   * Charts were generated after running queries for 5 mins, but I've only made one execution per case, so I'm not sure about repeatability 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] gortiz commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "gortiz (via GitHub)" <gi...@apache.org>.

gortiz commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1265085660


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =

Review Comment:
   I won't care that much about adding a conditional here given the complexity of [HashMap.computeIfAbsent](https://github.com/openjdk/jdk/blob/acf591e856ce4b43303b1578bd64a8c9ab0063ea/src/java.base/share/classes/java/util/HashMap.java#L1195).
   
   > Also do you know if the JDK can do loop unrolling here?
   
   I don't know, but I would guess it doesn't. What we do here is too complex. We are creating a new instance that copy some data from an array (in another loop) then we lookup for that new object in the map and in case the value is not there we call a lambda to create the value of that key. After that we just add the element to the list.
   
   We can try to apply some extra optimizations here. For example we can use a lightweight version of Key that does not copy the array of keys but get a reference to the column and the same `_columnIndices` we use right now and uses that to calculate hash and equals. Therefore we wouldn't need to create heavier instances for each row. The main problem with this approach is that the hashCode and equals will be a bit slower and we would need to keep a reference to the original row. But the latter can be further optimized



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.

Jackie-Jiang commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1263997923


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/query/SelectionOnlyOperator.java:
##########
@@ -75,6 +75,8 @@ public SelectionOnlyOperator(IndexSegment indexSegment, QueryContext queryContex
     _dataSchema = new DataSchema(columnNames, columnDataTypes);
 
     _numRowsToKeep = queryContext.getLimit();
+    // TODO(gortiz): I think this is incorrect. The SelectionOperatorUtils.MAX_ROW_HOLDER_INITIAL_CAPACITY limit
+    //  is not enforced later in getNextBlock

Review Comment:
   We need to limit the initial list size because it might not be filled up. In a lot of scenario `LIMIT` can be very large (could be Integer.MAX_VALUE)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] codecov-commenter commented on pull request #11112: V2 allocation optimizations

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.

codecov-commenter commented on PR #11112:
URL: https://github.com/apache/pinot/pull/11112#issuecomment-1635997734

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/11112?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#11112](https://app.codecov.io/gh/apache/pinot/pull/11112?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (66bd4da) into [master](https://app.codecov.io/gh/apache/pinot/commit/35d7a8faa5e969602a637a68f158c51fdc594c38?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (35d7a8f) will **decrease** coverage by `0.01%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #11112      +/-   ##
   ==========================================
   - Coverage    0.11%    0.11%   -0.01%     
   ==========================================
     Files        2203     2203              
     Lines      118165   118192      +27     
     Branches    17879    17882       +3     
   ==========================================
     Hits          137      137              
   - Misses     118008   118035      +27     
     Partials       20       20              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `0.00% <0.00%> (ø)` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin17 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin20 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/11112?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...not/core/operator/query/SelectionOnlyOperator.java](https://app.codecov.io/gh/apache/pinot/pull/11112?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9xdWVyeS9TZWxlY3Rpb25Pbmx5T3BlcmF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...pinot/query/runtime/operator/HashJoinOperator.java](https://app.codecov.io/gh/apache/pinot/pull/11112?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcXVlcnktcnVudGltZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcXVlcnkvcnVudGltZS9vcGVyYXRvci9IYXNoSm9pbk9wZXJhdG9yLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...inot/query/runtime/operator/TransformOperator.java](https://app.codecov.io/gh/apache/pinot/pull/11112?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcXVlcnktcnVudGltZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcXVlcnkvcnVudGltZS9vcGVyYXRvci9UcmFuc2Zvcm1PcGVyYXRvci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [.../query/runtime/operator/exchange/HashExchange.java](https://app.codecov.io/gh/apache/pinot/pull/11112?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcXVlcnktcnVudGltZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcXVlcnkvcnVudGltZS9vcGVyYXRvci9leGNoYW5nZS9IYXNoRXhjaGFuZ2UuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] gortiz commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "gortiz (via GitHub)" <gi...@apache.org>.

gortiz commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1265087149


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -213,57 +218,97 @@ private TransferableBlock buildJoinedDataBlock(TransferableBlock leftBlock)
       _isTerminated = true;
       return new TransferableBlock(returnRows, _resultSchema, DataBlock.Type.ROW);
     }
-    List<Object[]> rows = new ArrayList<>();
-    List<Object[]> container = leftBlock.isEndOfStreamBlock() ? new ArrayList<>() : leftBlock.getContainer();
-    for (Object[] leftRow : container) {
-      Key key = new Key(_leftKeySelector.getKey(leftRow));
+    List<Object[]> rows;
+    if (leftBlock.isEndOfStreamBlock()) {
+      rows = new ArrayList<>();
+    } else {
       switch (_joinType) {
-        case SEMI:
-          // SEMI-JOIN only checks existence of the key
-          if (_broadcastRightTable.containsKey(key)) {
-            rows.add(joinRow(leftRow, null));
-          }
+        case SEMI: {
+          rows = buildJoinedDataBlockSemi(leftBlock);
           break;
-        case ANTI:
-          // ANTI-JOIN only checks non-existence of the key
-          if (!_broadcastRightTable.containsKey(key)) {
-            rows.add(joinRow(leftRow, null));
-          }
+        }
+        case ANTI: {
+          rows = buildJoinedDataBlockAnti(leftBlock);
           break;
-        default: // INNER, LEFT, RIGHT, FULL
-          // NOTE: Empty key selector will always give same hash code.
-          List<Object[]> matchedRightRows = _broadcastRightTable.getOrDefault(key, null);
-          if (matchedRightRows == null) {
-            if (needUnmatchedLeftRows()) {
-              rows.add(joinRow(leftRow, null));
-            }
-            continue;
-          }
-          boolean hasMatchForLeftRow = false;
-          for (int i = 0; i < matchedRightRows.size(); i++) {
-            Object[] rightRow = matchedRightRows.get(i);
-            // TODO: Optimize this to avoid unnecessary object copy.
-            Object[] resultRow = joinRow(leftRow, rightRow);
-            if (_joinClauseEvaluators.isEmpty() || _joinClauseEvaluators.stream().allMatch(
-                evaluator -> (Boolean) TypeUtils.convert(evaluator.apply(resultRow),
-                    DataSchema.ColumnDataType.BOOLEAN))) {
-              rows.add(resultRow);
-              hasMatchForLeftRow = true;
-              if (_matchedRightRows != null) {
-                HashSet<Integer> matchedRows = _matchedRightRows.computeIfAbsent(key, k -> new HashSet<>());
-                matchedRows.add(i);
-              }
-            }
-          }
-          if (!hasMatchForLeftRow && needUnmatchedLeftRows()) {
-            rows.add(joinRow(leftRow, null));
-          }
+        }
+        default: {
+          rows = buildJoinedDataBlockDefault(leftBlock);
           break;
+        }
       }
     }
     return new TransferableBlock(rows, _resultSchema, DataBlock.Type.ROW);
   }
 
+  // INNER, LEFT, RIGHT, FULL

Review Comment:
   This is an original comment associated with the older switch default case. I'm going to move it to the current default case.\
   
   Just for a reader, the changes on the switch were mostly to move the code to a method per case in order to help the JIT. There is no guarantee that it would improve the performance, but it won't make it worse.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] ankitsultana commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.

ankitsultana commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1264012455


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =

Review Comment:
   This loop could likely be called 100s of millions of times per second for moderately high qps (10-100). Would adding an additional branch here offset the benefits of running `ensureCapacity`?
   
   Also do you know if the JDK can do loop unrolling here?



##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =
+            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>(16));

Review Comment:
   Running `new Key()` would create a new object in each iteration. If this loop is run millions of times per second, and assuming an object reference is at least 8 bytes, then this could add MBs of data on the TLAB which would likely lead to slow allocations, and also these would be unnecessary short-lived objects.
   
   Do you think we can change the FieldSelector to return a `Key` directly?
   
   Or we could re-use the `Key` object and add a setter to it, so that we reuse the reference and only update the inner reference with a new reference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] ankitsultana commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.

ankitsultana commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1263996593


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;

Review Comment:
   nit: unused. Should this be a final constant declared at class level?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] ankitsultana commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.

ankitsultana commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1264014292


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =
+            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>(16));

Review Comment:
   Running `new Key()` would create a new object in each iteration. If this loop is run millions of times per second, and assuming an object reference is at least 8 bytes, then this could add MBs of data on the TLAB which would likely lead to slow allocations, and also these would be unnecessary short-lived objects.
   
   Do you think we can change the FieldSelector to return a `Key` directly? (edit: this won't likely work either based on how the code is laid out.. looks inevitable)
   
   ~~Or we could re-use the `Key` object and add a setter to it, so that we reuse the reference and only update the inner reference with a new reference.~~ (my bad this won't work obviously)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] ankitsultana commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.

ankitsultana commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1264014292


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =
+            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>(16));

Review Comment:
   Running `new Key()` would create a new object in each iteration. If this loop is run millions of times per second, and assuming an object reference is at least 8 bytes, then this could add MBs of data on the TLAB which would likely lead to slow allocations, and also these would be unnecessary short-lived objects.
   
   Do you think we can change the FieldSelector to return a `Key` directly?
   
   ~~Or we could re-use the `Key` object and add a setter to it, so that we reuse the reference and only update the inner reference with a new reference.~~ (my bad this won't work obviously)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] gortiz commented on pull request #11112: V2 allocation optimizations

Posted by "gortiz (via GitHub)" <gi...@apache.org>.

gortiz commented on PR #11112:
URL: https://github.com/apache/pinot/pull/11112#issuecomment-1635937097

   cc @walterddr @Jackie-Jiang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] walterddr commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "walterddr (via GitHub)" <gi...@apache.org>.

walterddr commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1263916123


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -213,57 +218,97 @@ private TransferableBlock buildJoinedDataBlock(TransferableBlock leftBlock)
       _isTerminated = true;
       return new TransferableBlock(returnRows, _resultSchema, DataBlock.Type.ROW);
     }
-    List<Object[]> rows = new ArrayList<>();
-    List<Object[]> container = leftBlock.isEndOfStreamBlock() ? new ArrayList<>() : leftBlock.getContainer();
-    for (Object[] leftRow : container) {
-      Key key = new Key(_leftKeySelector.getKey(leftRow));
+    List<Object[]> rows;
+    if (leftBlock.isEndOfStreamBlock()) {
+      rows = new ArrayList<>();
+    } else {
       switch (_joinType) {
-        case SEMI:
-          // SEMI-JOIN only checks existence of the key
-          if (_broadcastRightTable.containsKey(key)) {
-            rows.add(joinRow(leftRow, null));
-          }
+        case SEMI: {
+          rows = buildJoinedDataBlockSemi(leftBlock);
           break;
-        case ANTI:
-          // ANTI-JOIN only checks non-existence of the key
-          if (!_broadcastRightTable.containsKey(key)) {
-            rows.add(joinRow(leftRow, null));
-          }
+        }
+        case ANTI: {
+          rows = buildJoinedDataBlockAnti(leftBlock);
           break;
-        default: // INNER, LEFT, RIGHT, FULL
-          // NOTE: Empty key selector will always give same hash code.
-          List<Object[]> matchedRightRows = _broadcastRightTable.getOrDefault(key, null);
-          if (matchedRightRows == null) {
-            if (needUnmatchedLeftRows()) {
-              rows.add(joinRow(leftRow, null));
-            }
-            continue;
-          }
-          boolean hasMatchForLeftRow = false;
-          for (int i = 0; i < matchedRightRows.size(); i++) {
-            Object[] rightRow = matchedRightRows.get(i);
-            // TODO: Optimize this to avoid unnecessary object copy.
-            Object[] resultRow = joinRow(leftRow, rightRow);
-            if (_joinClauseEvaluators.isEmpty() || _joinClauseEvaluators.stream().allMatch(
-                evaluator -> (Boolean) TypeUtils.convert(evaluator.apply(resultRow),
-                    DataSchema.ColumnDataType.BOOLEAN))) {
-              rows.add(resultRow);
-              hasMatchForLeftRow = true;
-              if (_matchedRightRows != null) {
-                HashSet<Integer> matchedRows = _matchedRightRows.computeIfAbsent(key, k -> new HashSet<>());
-                matchedRows.add(i);
-              }
-            }
-          }
-          if (!hasMatchForLeftRow && needUnmatchedLeftRows()) {
-            rows.add(joinRow(leftRow, null));
-          }
+        }
+        default: {
+          rows = buildJoinedDataBlockDefault(leftBlock);
           break;
+        }
       }
     }
     return new TransferableBlock(rows, _resultSchema, DataBlock.Type.ROW);
   }
 
+  // INNER, LEFT, RIGHT, FULL

Review Comment:
   the comment is meant for next private method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang merged pull request #11112: V2 allocation optimizations

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.

Jackie-Jiang merged PR #11112:
URL: https://github.com/apache/pinot/pull/11112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] walterddr commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "walterddr (via GitHub)" <gi...@apache.org>.

walterddr commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1263915511


##########
pinot-core/src/main/java/org/apache/pinot/core/operator/query/SelectionOnlyOperator.java:
##########
@@ -75,6 +75,8 @@ public SelectionOnlyOperator(IndexSegment indexSegment, QueryContext queryContex
     _dataSchema = new DataSchema(columnNames, columnDataTypes);
 
     _numRowsToKeep = queryContext.getLimit();
+    // TODO(gortiz): I think this is incorrect. The SelectionOperatorUtils.MAX_ROW_HOLDER_INITIAL_CAPACITY limit
+    //  is not enforced later in getNextBlock

Review Comment:
   this hint is only used during leaf initialization. i don't think there's any plan to create another default on intermediate select/project. but we can see if that helps 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] gortiz commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "gortiz (via GitHub)" <gi...@apache.org>.

gortiz commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1265087149


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -213,57 +218,97 @@ private TransferableBlock buildJoinedDataBlock(TransferableBlock leftBlock)
       _isTerminated = true;
       return new TransferableBlock(returnRows, _resultSchema, DataBlock.Type.ROW);
     }
-    List<Object[]> rows = new ArrayList<>();
-    List<Object[]> container = leftBlock.isEndOfStreamBlock() ? new ArrayList<>() : leftBlock.getContainer();
-    for (Object[] leftRow : container) {
-      Key key = new Key(_leftKeySelector.getKey(leftRow));
+    List<Object[]> rows;
+    if (leftBlock.isEndOfStreamBlock()) {
+      rows = new ArrayList<>();
+    } else {
       switch (_joinType) {
-        case SEMI:
-          // SEMI-JOIN only checks existence of the key
-          if (_broadcastRightTable.containsKey(key)) {
-            rows.add(joinRow(leftRow, null));
-          }
+        case SEMI: {
+          rows = buildJoinedDataBlockSemi(leftBlock);
           break;
-        case ANTI:
-          // ANTI-JOIN only checks non-existence of the key
-          if (!_broadcastRightTable.containsKey(key)) {
-            rows.add(joinRow(leftRow, null));
-          }
+        }
+        case ANTI: {
+          rows = buildJoinedDataBlockAnti(leftBlock);
           break;
-        default: // INNER, LEFT, RIGHT, FULL
-          // NOTE: Empty key selector will always give same hash code.
-          List<Object[]> matchedRightRows = _broadcastRightTable.getOrDefault(key, null);
-          if (matchedRightRows == null) {
-            if (needUnmatchedLeftRows()) {
-              rows.add(joinRow(leftRow, null));
-            }
-            continue;
-          }
-          boolean hasMatchForLeftRow = false;
-          for (int i = 0; i < matchedRightRows.size(); i++) {
-            Object[] rightRow = matchedRightRows.get(i);
-            // TODO: Optimize this to avoid unnecessary object copy.
-            Object[] resultRow = joinRow(leftRow, rightRow);
-            if (_joinClauseEvaluators.isEmpty() || _joinClauseEvaluators.stream().allMatch(
-                evaluator -> (Boolean) TypeUtils.convert(evaluator.apply(resultRow),
-                    DataSchema.ColumnDataType.BOOLEAN))) {
-              rows.add(resultRow);
-              hasMatchForLeftRow = true;
-              if (_matchedRightRows != null) {
-                HashSet<Integer> matchedRows = _matchedRightRows.computeIfAbsent(key, k -> new HashSet<>());
-                matchedRows.add(i);
-              }
-            }
-          }
-          if (!hasMatchForLeftRow && needUnmatchedLeftRows()) {
-            rows.add(joinRow(leftRow, null));
-          }
+        }
+        default: {
+          rows = buildJoinedDataBlockDefault(leftBlock);
           break;
+        }
       }
     }
     return new TransferableBlock(rows, _resultSchema, DataBlock.Type.ROW);
   }
 
+  // INNER, LEFT, RIGHT, FULL

Review Comment:
   This is an original comment associated with the older switch default case and should be deleted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[GitHub] [pinot] ankitsultana commented on a diff in pull request #11112: V2 allocation optimizations

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.

ankitsultana commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1264014292


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =
+            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>(16));

Review Comment:
   Running `new Key()` would create a new object in each iteration. If this loop is run millions of times per second, and assuming an object reference is at least 8 bytes, then this could add MBs of data on the TLAB which would likely lead to slow allocations, and also these would be unnecessary short-lived objects.
   
   Do you think we can change the FieldSelector to return a `Key` directly? (edit: this won't likely work either based on how the code is laid out.. looks inevitable)
   
   ~~Or we could re-use the `Key` object and add a setter to it, so that we reuse the reference and only update the inner reference with a new reference.~~ (my bad this won't work obviously)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org