You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/03 13:20:47 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5754: Improving optimizer performance by eliminating unnecessary sort and distribution requirements

alamb commented on code in PR #5754:
URL: https://github.com/apache/arrow-datafusion/pull/5754#discussion_r1155944850


##########
datafusion/core/src/execution/context.rs:
##########
@@ -1293,9 +1293,6 @@ impl SessionState {
             // repartitioning and local sorting steps to meet distribution and ordering requirements.
             // Therefore, it should run before EnforceDistribution and EnforceSorting.
             Arc::new(JoinSelection::new()),
-            // Enforce sort before PipelineFixer

Review Comment:
   👍 



##########
datafusion/common/src/config.rs:
##########
@@ -280,6 +280,10 @@ config_namespace! {
         /// using the provided `target_partitions` level
         pub repartition_joins: bool, default = true
 
+        /// Should DataFusion allow symmetric hash joins for unbounded data sources even when
+        /// its inputs do not have any ordering or filtering
+        pub allow_symmetric_joins_without_pruning: bool, default = true

Review Comment:
   I don't understand how a symmetric hash join could generate correct results when the inputs don't have any ordering 🤔  Maybe we can add some additional comments about under what circumstances one would enable 
    / disable this option.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org