You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/13 13:48:01 UTC

[GitHub] [arrow-datafusion] ygf11 opened a new pull request, #4193: Support normal expressions in equality join

ygf11 opened a new pull request, #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193

   # Which issue does this PR close?
   
   Closes #4140.
   
   # Rationale for this change
   
   Currently datafusion will parse some joins as `cross join` when left or right join key contain normal expression.
   For example, `select * from test0 INNER JOIN test1 ON test0.c0 + 1 = test1.c1 * 2`.
   
   To improve performance, these joins should also parse as `join`.
   
   # What changes are included in this PR?
   
   * Support extract expression join key in `extract_join_keys`.
   * Add projection if needs when create logical plan of join. 
   
   # Are these changes tested?
   
   Yes
   
   # Are there any user-facing changes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023876058


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   I do not see you have logic to trim the temp projected columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023914370


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   > I do not see you have logic to trim the temp projected columns.
   
   Yes, for `select *`, the final result will contains the additional columns.
   
   Do you means we need to add a new projection to trim the temp projected columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023872078


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   Sorry, do not get chance to take a look at the PR.
   Could you please add one more test case :
   
   ````
           let sql = "SELECT *
               FROM person \
               INNER JOIN orders \
               ON orders.customer_id * 2 = person.id + 10"
   
   ````
   This is to verify the new added projection will not introduce additional columns in the final result and make sure those temp projected columns will be trimmed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023890252


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_single_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + 10 = orders.customer_id * 2";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_multiple_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id * Int64(2) - orders.price\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_left_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id = orders.customer_id * Int64(2) - orders.price\
+        \n    TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+

Review Comment:
   > For example, I wonder if it is correct to always pull expressions below the join for FULL OUTER JOINs 🤔
   > 
   > Can you add a test for something like
   > 
   > ```
   >  let sql = "SELECT id, order_id \
   >             FROM person \
   >             FULL OUTER JOIN orders \
   >             ON person.id = 10";
   > ```
   > 
   > if it doesn't already exist?
   > 
   > In that case the predicates need to be evaluated within the join (if it is done as a filter afterwards it may filter rows that should not be filtered)
   
   I think it is OK to projection such kind expressions, the type of the new projected column type is bool, the bool value is evaluated again during the join.
   
   
   One thing that I am not clear now and whether we should cover in this PR and can do in the following PR is
   Should we allow the following cases ?
   
   // Join conditions include suspicious Exprs
   ```
   SELECT id, order_id 
                  FROM person 
                  FULL OUTER JOIN orders 
                  ON person.id = person.name = order.id
   ```
   
    
   // Join conditions include non-deterministic Exprs
   ```
   SELECT id, order_id 
                  FROM person 
                  FULL OUTER JOIN orders 
                  ON person.id = person.name || random(xxx)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on pull request #4193: Support normal expressions in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1313310353

   Thank you @mingmwang.
   
   > Not sure whether the two methods can be combined or not.
   
   The `extract_join_keys` is used to generate explicit join plan, and the `extract_possible_join_keys` is used to optimize `selection clause` from `cross join` to `inner join`.
   
   I think the logic of generating two join plan are similar, so we can try to combine them next.
   
   > From my understanding, I think the extract process should be
   
   Yes, you are right. The first two was done by original `extract_join_keys`, and I add the three to `extract_join_keys` in this pr.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023914370


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   > I do not see you have logic to trim the temp projected columns.
   Yes, for `select *`, the final result will contains the additional columns.
   
   Do you means we need to add a new projection to trim temp projected columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1319408175

   Sorry, I forget to submit the review.
   Then some description are `pending`, you do not see them.
   
   @alamb @mingmwang, any comment are welcome, I can fix them in another pr. Thank you.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023763326


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_single_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + 10 = orders.customer_id * 2";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_multiple_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id * Int64(2) - orders.price\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_left_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id = orders.customer_id * Int64(2) - orders.price\
+        \n    TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+

Review Comment:
   >  let sql = "SELECT id, order_id \
               FROM person \
               FULL OUTER JOIN orders \
               ON person.id = 10";
   
   currently I transform `this sql` to `cross join` as master does, but as you explained it may filter rows that should not be filtered, we should do same for this sql.
   
   Unfortunately, one test case will fail caused by different data type of join keys, after I add the change.
   https://github.com/apache/arrow-datafusion/blob/406c1087bc16f8d2a49e5a9b05d2a0e1b67f7aa5/datafusion/core/tests/sql/joins.rs#L369-L385
   
   I think we can fix it after we add `type_coercion` for join in optimizer rule.
   * https://github.com/apache/arrow-datafusion/issues/2877
   
   Any way, I add the full join test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023763326


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_single_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + 10 = orders.customer_id * 2";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_multiple_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id * Int64(2) - orders.price\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_left_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id = orders.customer_id * Int64(2) - orders.price\
+        \n    TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+

Review Comment:
   >  let sql = "SELECT id, order_id \
               FROM person \
               FULL OUTER JOIN orders \
               ON person.id = 10";
   
   currently I transfer `this sql` to `cross join` as master does, but as you explained it may filter rows that should not be filtered, we should do same for this sql.
   
   Unfortunately, one test case will fail caused by different data type of join keys, after I add the change.
   https://github.com/apache/arrow-datafusion/blob/406c1087bc16f8d2a49e5a9b05d2a0e1b67f7aa5/datafusion/core/tests/sql/joins.rs#L369-L385
   
   I think we can fix it after we add `type_coercion` for join in optimizer rule.
   * https://github.com/apache/arrow-datafusion/issues/2877
   
   Any way, I add the full join test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1315781509

   I restarted the failed CI job for this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023914370


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   > I do not see you have logic to trim the temp projected columns.
   
   Yes, for `select *`, the final result will contains the additional columns.
   
   Do you means we need to add a new projection to trim temp projected columns?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1318950028

   Benchmark runs are scheduled for baseline = 062acadd16cf0b474ca599d092cb2ebb03702114 and contender = 822022db88d4f70c2f02ac4d1828fc9413f3e252. 822022db88d4f70c2f02ac4d1828fc9413f3e252 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e41c55c978544efcbc03edea5284d16a...597271757d5847cca8e312e630cc48fd/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/335734182c434165bedc4f85e57fc29e...1ab62c4673cf4f019d4d345db512a52e/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/4c7793033cdf495b808ec6ea6c20f6a9...5990955be9d549bba7c29d5b7881bac7/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/1947106bfe3e45d0a1177adb1fe1a90a...d71323df2a9f48aea55ba97d33b20a21/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on pull request #4193: Support normal expressions in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1312757612

   What's the difference between `extract_join_keys() `and  `extract_possible_join_keys()`?
   Not sure whether the two methods can be combined or not.
   From my understanding, I think the extract process should be
   1. Convert the expr/predicate to CNF (I think we already had the utils)
   2. Split the CNF to Vec of expr/predicate (I think we already had the utils)
   3. For each expr in the split exprs, check if is a BinaryExpr with Operator::Eq and the left and right are coming from different input schemas.
   
   @alamb @jackwener 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023984136


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   Yes, they should be removed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023872078


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   Sorry, do not get chance to take a look at the PR.
   @ygf11  Could you please add one more test case :
   
   ````
           let sql = "SELECT *
               FROM person \
               INNER JOIN orders \
               ON orders.customer_id * 2 = person.id + 10"
   
   ````
   This is to verify the new added projection will not introduce additional columns in the final result and make sure those temp projected columns will be trimmed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1317015649

   Please also add test case to verify this SQL to make sure the PR does not add unnecessary projection Exprs.
   
   ```
   let sql = "SELECT  orders.customer_id * 2,  person.id + 10
               FROM person 
               INNER JOIN orders 
               ON orders.customer_id * 2 = person.id + 10"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1024918161


##########
datafusion/sql/src/planner.rs:
##########
@@ -750,9 +746,46 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
                         .unwrap_or(Ok(join))?
                         .build()
                 } else {
-                    LogicalPlanBuilder::from(left)
-                        .join(&right, join_type, (left_keys, right_keys), join_filter)?
-                        .build()
+                    // Wrap projection for left input if left join keys contain normal expression.
+                    let (left_child, left_projected) =
+                        wrap_projection_for_join_if_necessary(&left_keys, left)?;
+                    let left_join_keys = left_keys
+                        .iter()
+                        .map(|key| {
+                            key.try_into_col()
+                                .or_else(|_| Ok(Column::from_name(key.display_name()?)))
+                        })
+                        .collect::<Result<Vec<_>>>()?;
+
+                    // Wrap projection for right input if right join keys contains normal expression.
+                    let (right_child, right_projected) =
+                        wrap_projection_for_join_if_necessary(&right_keys, right)?;
+                    let right_join_keys = right_keys
+                        .iter()
+                        .map(|key| {
+                            key.try_into_col()
+                                .or_else(|_| Ok(Column::from_name(key.display_name()?)))
+                        })
+                        .collect::<Result<Vec<_>>>()?;
+
+                    let join_plan_builder = LogicalPlanBuilder::from(left_child).join(
+                        &right_child,
+                        join_type,
+                        (left_join_keys, right_join_keys),
+                        join_filter,
+                    )?;
+
+                    // Remove temporary projected columns if necessary.
+                    if left_projected || right_projected {
+                        let final_join_result = join_schema

Review Comment:
   Trim the temp projected columns here, and the test case is `test_select_all_inner_join`. 
   cc @mingmwang 



##########
datafusion/sql/src/planner.rs:
##########
@@ -3005,6 +3079,32 @@ fn extract_possible_join_keys(
     }
 }
 
+/// Wrap projection for a plan, if the join keys contains normal expression.
+fn wrap_projection_for_join_if_necessary(
+    join_keys: &[Expr],
+    input: LogicalPlan,
+) -> Result<(LogicalPlan, bool)> {
+    let expr_join_keys = join_keys
+        .iter()
+        .flat_map(|expr| expr.try_into_col().is_err().then_some(expr))
+        .cloned()
+        .collect::<HashSet<Expr>>();

Review Comment:
   When projection items contain duplicated expressions, the sql will failed.
   So I do deduplication here, relative test case is also added.
   
   cc @alamb @mingmwang 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1320194508

   Thanks @ygf11  -- I think it looks good to me; Thanks again for all your help


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1318307973

   > Please also add test case to verify this SQL to make sure the PR does not add unnecessary projection Exprs.
   
   Added as `test_select_join_key_inner_join`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on pull request #4193: Support normal expressions in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#issuecomment-1313327125

   The failed ci is not relative to this pr. 
   
   @Dandandan  @alamb @mingmwang PTAL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1025465646


##########
datafusion/sql/src/planner.rs:
##########
@@ -5633,6 +5733,205 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered\
+        \n    Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n      Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n        TableScan: person\
+        \n      Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n        TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_single_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + 10 = orders.customer_id * 2";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered\
+        \n    Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n      Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n        TableScan: person\
+        \n      Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n        TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_multiple_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered\
+        \n    Inner Join: person.id + person.age + Int64(10) = orders.customer_id * Int64(2) - orders.price\
+        \n      Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n        TableScan: person\
+        \n      Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n        TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_left_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered\
+        \n    Inner Join: person.id + person.age + Int64(10) = orders.customer_id\
+        \n      Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n        TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered\
+        \n    Inner Join: person.id = orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: person\
+        \n      Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n        TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_one_side_constant_full_join() {
+        // TODO: this sql should be parsed as join after

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
alamb merged PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
mingmwang commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023890252


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_single_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + 10 = orders.customer_id * 2";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_multiple_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id * Int64(2) - orders.price\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_left_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id = orders.customer_id * Int64(2) - orders.price\
+        \n    TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+

Review Comment:
   > For example, I wonder if it is correct to always pull expressions below the join for FULL OUTER JOINs 🤔
   > 
   > Can you add a test for something like
   > 
   > ```
   >  let sql = "SELECT id, order_id \
   >             FROM person \
   >             FULL OUTER JOIN orders \
   >             ON person.id = 10";
   > ```
   > 
   > if it doesn't already exist?
   > 
   > In that case the predicates need to be evaluated within the join (if it is done as a filter afterwards it may filter rows that should not be filtered)
   
   I think it is OK to projection such kind expressions, the type of the new projected column type is bool, the bool value is 
   evaluated again during the join.
   
   
   One thing that I am not clear now and whether we should cover in this PR and can do in the following PR is
   Should we allow the following cases ?
   
   // Join conditions include suspicious Exprs
   SELECT id, order_id 
                  FROM person 
                  FULL OUTER JOIN orders 
                  ON person.id = person.name = order.id
   
    
   // Join conditions include non-deterministic Exprs
   SELECT id, order_id 
                  FROM person 
                  FULL OUTER JOIN orders 
                  ON person.id = person.name || random(xxx)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
ygf11 commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023912724


##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_single_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + 10 = orders.customer_id * 2";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_multiple_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id * Int64(2) - orders.price\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_left_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id = orders.customer_id * Int64(2) - orders.price\
+        \n    TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+

Review Comment:
   > I think it is OK to projection such kind expressions, the type of the new projected column type is bool, the bool value is evaluated again during the join.
   
   It seems your idea make things simple.
   
   Currently for join condition like `person.id = 10`, I treat `person.id` as left key, and `10` as `right key`, this need do type coercion.
   
   But as you suggestion, we can treat `person.id = 10` as left key, and right key as `true`, 
   then exist `type coercion` will handle this condition.
   
   I rewrite the expression to `(person.id = 10) = true`, then the test passes now. 
   The full join test case will like:
   ```rust
   #[test]
       fn test_one_side_constant_full_join() {
           let sql = "SELECT id, order_id \
               FROM person \
               FULL OUTER JOIN orders \
               ON person.id = 10";
   
           let expected = "Projection: person.id, orders.order_id\
           \n  Filter: person.id = Int64(10) = Boolean(true)\
           \n    CrossJoin:\
           \n      TableScan: person\
           \n      TableScan: orders";
           quick_test(sql, expected);
       }
   ```
    
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4193: Support all equality predicates in equality join

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4193:
URL: https://github.com/apache/arrow-datafusion/pull/4193#discussion_r1023183597


##########
datafusion/sql/src/planner.rs:
##########
@@ -2837,36 +2861,75 @@ fn remove_join_expressions(
 /// foo = bar => accum=[(foo, bar)] accum_filter=[]
 /// foo = bar AND bar = baz => accum=[(foo, bar), (bar, baz)] accum_filter=[]
 /// foo = bar AND baz > 1 => accum=[(foo, bar)] accum_filter=[baz > 1]
+///
+/// For normal expression join key, assume we have tables -- a(c0, c1 c2) and b(c0, c1, c2):

Review Comment:
   I think this kind of join key is typically called an "equijoin" rather than "normal expression join key"
   
   Thank you for these comments



##########
datafusion/sql/src/planner.rs:
##########
@@ -2934,6 +2999,31 @@ fn parse_sql_number(n: &str) -> Result<Expr> {
         })
 }
 
+/// Wrap projection for a plan, if the join keys contains normal expression.
+fn wrap_projection_for_join_if_necessary(
+    join_keys: &[Expr],
+    input: LogicalPlan,
+) -> Result<LogicalPlan> {
+    let expr_join_keys = join_keys
+        .iter()
+        .flat_map(|expr| expr.try_into_col().is_err().then_some(expr))
+        .cloned()
+        .collect::<Vec<_>>();
+
+    let plan = if expr_join_keys.is_empty().not() {

Review Comment:
   I found the use of `not()` confusing here -- I think it would be more consistent to use
   
   
   ```suggestion
       let plan = if !expr_join_keys.is_empty() {
   ```



##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\

Review Comment:
   👍 



##########
datafusion/sql/src/planner.rs:
##########
@@ -5509,6 +5599,99 @@ mod tests {
         assert!(logical_plan("SELECT \"1\"").is_err());
     }
 
+    #[test]
+    fn test_constant_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Filter: person.id = Int64(10)\
+        \n    CrossJoin:\
+        \n      TableScan: person\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_left_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON orders.customer_id * 2 = person.id + 10";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_single_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + 10 = orders.customer_id * 2";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + Int64(10) = orders.customer_id * Int64(2)\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2)\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_multiple_column_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id * Int64(2) - orders.price\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_left_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id + person.age + 10 = orders.customer_id";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id + person.age + Int64(10) = orders.customer_id\
+        \n    Projection: person.id, person.first_name, person.last_name, person.age, person.state, person.salary, person.birth_date, person.😀, person.id + person.age + Int64(10)\
+        \n      TableScan: person\
+        \n    TableScan: orders";
+        quick_test(sql, expected);
+    }
+
+    #[test]
+    fn test_right_projection_expr_eq_join() {
+        let sql = "SELECT id, order_id \
+            FROM person \
+            INNER JOIN orders \
+            ON person.id = orders.customer_id * 2 - orders.price";
+
+        let expected = "Projection: person.id, orders.order_id\
+        \n  Inner Join: person.id = orders.customer_id * Int64(2) - orders.price\
+        \n    TableScan: person\
+        \n    Projection: orders.order_id, orders.customer_id, orders.o_item_id, orders.qty, orders.price, orders.delivered, orders.customer_id * Int64(2) - orders.price\
+        \n      TableScan: orders";
+        quick_test(sql, expected);
+    }
+

Review Comment:
   Very nice test !
   
   As I recall this type of transformation can not always be done for all types of joins (though perhaps it is ok for equijoins)
   
   For example, I wonder if it is correct to always pull expressions below the join for FULL OUTER JOINs 🤔 
   
   Can you add a test for something like
   
   ```
    let sql = "SELECT id, order_id \
               FROM person \
               FULL OUTER JOIN orders \
               ON person.id = 10";
   ```
   
   if it doesn't already exist?
   
   In that case the predicates need to be evaluated within the join (if it is done as a filter afterwards it may filter rows that should not be filtered)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org