You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/21 18:34:10 UTC

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8720: ARROW-10585: [Rust] [DataFusion] Add join support to DataFrame and LogicalPlan

jorgecarleitao commented on a change in pull request #8720:
URL: https://github.com/apache/arrow/pull/8720#discussion_r528227594



##########
File path: rust/datafusion/src/logical_plan/plan.rs
##########
@@ -94,6 +101,21 @@ pub enum LogicalPlan {
         /// The incoming logical plan
         input: Arc<LogicalPlan>,
     },
+    /// Join two logical plans on one or more join columns
+    Join {
+        /// Left input
+        left: Arc<LogicalPlan>,
+        /// Right input
+        right: Arc<LogicalPlan>,
+        /// Columns in left input to use for join keys
+        left_keys: Vec<String>,
+        /// Columns in right input to use for join keys
+        right_keys: Vec<String>,

Review comment:
       I would use `Vec<(left_i, right_i)>` because it automatically enforces the invariant that `left_keys.len() == right_keys.len()`. We can still keep the public interface `left_keys, right_keys` and perform the check before passing them to the builder.
   
   Atm, when we use `left.zip(right)`, we take the shortest vector, which may hide a bug in the code.

##########
File path: rust/datafusion/src/physical_plan/hash_utils.rs
##########
@@ -29,7 +29,7 @@ pub enum JoinType {
 }
 
 /// The on clause of the join, as vector of (left, right) columns.
-pub type JoinOn<'a> = [(&'a str, &'a str)];
+pub type JoinOn = [(String, String)];

Review comment:
       This is too small compared to the others ops to justify the effort at this point, IMO.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org