You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/07/17 09:16:33 UTC

[GitHub] [pinot] gortiz commented on a diff in pull request #11112: V2 allocation optimizations

gortiz commented on code in PR #11112:
URL: https://github.com/apache/pinot/pull/11112#discussion_r1265085660


##########
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/HashJoinOperator.java:
##########
@@ -169,9 +169,14 @@ private void buildBroadcastHashTable() {
       }
       List<Object[]> container = rightBlock.getContainer();
       // put all the rows into corresponding hash collections keyed by the key selector function.
+      int initialHeuristicSize = 16;
       for (Object[] row : container) {
-        List<Object[]> hashCollection =
-            _broadcastRightTable.computeIfAbsent(new Key(_rightKeySelector.getKey(row)), k -> new ArrayList<>());
+        ArrayList<Object[]> hashCollection =

Review Comment:
   I won't care that much about adding a conditional here given the complexity of [HashMap.computeIfAbsent](https://github.com/openjdk/jdk/blob/acf591e856ce4b43303b1578bd64a8c9ab0063ea/src/java.base/share/classes/java/util/HashMap.java#L1195).
   
   > Also do you know if the JDK can do loop unrolling here?
   
   I don't know, but I would guess it doesn't. What we do here is too complex. We are creating a new instance that copy some data from an array (in another loop) then we lookup for that new object in the map and in case the value is not there we call a lambda to create the value of that key. After that we just add the element to the list.
   
   We can try to apply some extra optimizations here. For example we can use a lightweight version of Key that does not copy the array of keys but get a reference to the column and the same `_columnIndices` we use right now and uses that to calculate hash and equals. Therefore we wouldn't need to create heavier instances for each row. The main problem with this approach is that the hashCode and equals will be a bit slower and we would need to keep a reference to the original row. But the latter can be further optimized



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org