You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/21 13:22:36 UTC

[GitHub] [arrow-datafusion] ygf11 opened a new pull request, #4307: Add another api to create join plan

ygf11 opened a new pull request, #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307

   # Which issue does this PR close?
   
   Closes #4306.
   
   # Rationale for this change
   
   I find it is not easy to create join logical plan whose keys are expressions, because we have to create many additional projections. 
   
   This new method can reduce some boilerplate code.
   
   # What changes are included in this PR?
   
   * Introduce a `join_with_expr_keys` to LogicalPlanBuilder.
   * Using this method when create join logical plan.
   * Some unit tests.
   
   # Are these changes tested?
   
   Yes.
   
   # Are there any user-facing changes?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4307: Add another api to create join plan

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#discussion_r1028559854


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -458,6 +461,81 @@ impl LogicalPlanBuilder {
         self.join_detailed(right, join_type, join_keys, filter, false)
     }
 
+    /// Apply a join with on constraint which the key is an expression.
+    ///
+    /// Filter expression expected to contain non-equality predicates that can not be pushed
+    /// down to any of join inputs.
+    /// In case of outer join, filter applied to only matched rows.
+    pub fn join_with_expr_keys(
+        &self,
+        right: &LogicalPlan,
+        join_type: JoinType,
+        join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>),

Review Comment:
   Given all the logic I think we now have in the optimizer (e.g. eliminate cross join https://github.com/apache/arrow-datafusion/pull/4185) I recommend we *ONLY* support specifying `filter` on this function and leave it up to the optimizer pass to sort out which are eqijoins and which are not



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] xudong963 commented on a diff in pull request #4307: Add another api to create join plan

Posted by GitBox <gi...@apache.org>.
xudong963 commented on code in PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#discussion_r1028677521


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -458,6 +461,81 @@ impl LogicalPlanBuilder {
         self.join_detailed(right, join_type, join_keys, filter, false)
     }
 
+    /// Apply a join with on constraint which the key is an expression.
+    ///
+    /// Filter expression expected to contain non-equality predicates that can not be pushed
+    /// down to any of join inputs.
+    /// In case of outer join, filter applied to only matched rows.
+    pub fn join_with_expr_keys(
+        &self,
+        right: &LogicalPlan,
+        join_type: JoinType,
+        join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>),

Review Comment:
   I think so. -- We should "believe" our optimizer, If it doesn't perform well, we can polish it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4307: Add another api to create join plan

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#discussion_r1028893527


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -458,6 +461,81 @@ impl LogicalPlanBuilder {
         self.join_detailed(right, join_type, join_keys, filter, false)
     }
 
+    /// Apply a join with on constraint which the key is an expression.
+    ///
+    /// Filter expression expected to contain non-equality predicates that can not be pushed
+    /// down to any of join inputs.
+    /// In case of outer join, filter applied to only matched rows.
+    pub fn join_with_expr_keys(
+        &self,
+        right: &LogicalPlan,
+        join_type: JoinType,
+        join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>),

Review Comment:
   Yeah, sort out by optimizer make sense to me. 
   
   But I have a question, transform join to `cross join` will lost join type, maybe we need a new rule to optimize join, not cross join?
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4307: Add another api to create join plan

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#discussion_r1028893527


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -458,6 +461,81 @@ impl LogicalPlanBuilder {
         self.join_detailed(right, join_type, join_keys, filter, false)
     }
 
+    /// Apply a join with on constraint which the key is an expression.
+    ///
+    /// Filter expression expected to contain non-equality predicates that can not be pushed
+    /// down to any of join inputs.
+    /// In case of outer join, filter applied to only matched rows.
+    pub fn join_with_expr_keys(
+        &self,
+        right: &LogicalPlan,
+        join_type: JoinType,
+        join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>),

Review Comment:
   Yeah, sort out by optimizer make sense to me. 
   
   But I have a question, transforming join to `cross join` will lost the join type, maybe we need a new rule to optimize join, not cross join?
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Add another api to create join plan [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed pull request #4307: Add another api to create join plan
URL: https://github.com/apache/arrow-datafusion/pull/4307


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4307: Add another api to create join plan

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#discussion_r1029390007


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -458,6 +461,81 @@ impl LogicalPlanBuilder {
         self.join_detailed(right, join_type, join_keys, filter, false)
     }
 
+    /// Apply a join with on constraint which the key is an expression.
+    ///
+    /// Filter expression expected to contain non-equality predicates that can not be pushed
+    /// down to any of join inputs.
+    /// In case of outer join, filter applied to only matched rows.
+    pub fn join_with_expr_keys(
+        &self,
+        right: &LogicalPlan,
+        join_type: JoinType,
+        join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>),

Review Comment:
   We should definitely test it, but I think the other passes will handle pushing predicates into Joins correctly -- for example https://github.com/apache/arrow-datafusion/blob/209c26670da4ee52cf13e4302250069a686fdaa2/datafusion/optimizer/src/filter_push_down.rs#L406-L443 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Add another api to create join plan [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#issuecomment-1830643790

   Closing as this PR is over a year old. Please feel free to reopen it / rebase it if you plan to keep working on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4307: Add another api to create join plan

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#discussion_r1028893527


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -458,6 +461,81 @@ impl LogicalPlanBuilder {
         self.join_detailed(right, join_type, join_keys, filter, false)
     }
 
+    /// Apply a join with on constraint which the key is an expression.
+    ///
+    /// Filter expression expected to contain non-equality predicates that can not be pushed
+    /// down to any of join inputs.
+    /// In case of outer join, filter applied to only matched rows.
+    pub fn join_with_expr_keys(
+        &self,
+        right: &LogicalPlan,
+        join_type: JoinType,
+        join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>),

Review Comment:
   Yeah, sort out by optimizer make sense to me. 
   
   But I have a question, transform join to `cross join` will lost join type, maybe we need a new rule to optimize join, not cross join?
   
   BTW, the reason I originally wanted to add this was to facilitate writing tests in optimizer, seems it needs do more things I think :smile:.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4307: Add another api to create join plan

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4307:
URL: https://github.com/apache/arrow-datafusion/pull/4307#discussion_r1030210612


##########
datafusion/expr/src/logical_plan/builder.rs:
##########
@@ -458,6 +461,81 @@ impl LogicalPlanBuilder {
         self.join_detailed(right, join_type, join_keys, filter, false)
     }
 
+    /// Apply a join with on constraint which the key is an expression.
+    ///
+    /// Filter expression expected to contain non-equality predicates that can not be pushed
+    /// down to any of join inputs.
+    /// In case of outer join, filter applied to only matched rows.
+    pub fn join_with_expr_keys(
+        &self,
+        right: &LogicalPlan,
+        join_type: JoinType,
+        join_keys: (Vec<impl Into<Expr>>, Vec<impl Into<Expr>>),

Review Comment:
   Thank you, got it. I will try it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org