You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Dandandan (via GitHub)" <gi...@apache.org> on 2023/06/16 14:32:50 UTC

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #6679: Improve performance/memory usage of HashJoin datastructure (5-15% improvement on selected TPC-H queries)

Dandandan commented on code in PR #6679:
URL: https://github.com/apache/arrow-datafusion/pull/6679#discussion_r1232334710


##########
datafusion/core/src/physical_plan/joins/symmetric_hash_join.rs:
##########
@@ -1119,6 +1159,144 @@ impl OneSideHashJoiner {
         Ok(())
     }
 
+    /// Gets build and probe indices which satisfy the on condition (including
+    /// the equality condition and the join filter) in the join.
+    #[allow(clippy::too_many_arguments)]
+    pub fn build_join_indices(

Review Comment:
   The old implementation moved to symmetric hash join. The added complexity to support both options in a more generic way seems to add more complexity than just having the two versions around (and further tune them for the specific purpose / algorithm).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org