You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/25 06:11:20 UTC

[GitHub] [arrow-datafusion] mingmwang opened a new issue, #4363: Should not covert a normal non-inner join to Cross Join when there are non-equal Join conditions

mingmwang opened a new issue, #4363:
URL: https://github.com/apache/arrow-datafusion/issues/4363

   **Describe the bug**
   A clear and concise description of what the bug is.
   
   ````
   CREATE TEMPORARY TABLE t1 (t1_id INT,t1_name String);
   CREATE TEMPORARY TABLE t2 (t2_id INT,t2_name String);
   
   insert into t1 values (11, "a"), (22, "b"), (33, "c"), (44, "d");
   insert into t2 values (11, "z"), (22, "y"), (44, "x"), (55, "w");
   
   SELECT t1_id, t1_name, t2_name FROM t1 LEFT JOIN t2 ON (t1_id != t2_id and t2_id >= 100) ORDER BY t1_id;
   
   ````
   
   SparkSQL result:
   
               "+-------+---------+---------+",
               "| t1_id | t1_name | t2_name |",
               "+-------+---------+---------+",
               "| 11    | a       |         |",
               "| 22    | b       |         |",
               "| 33    | c       |         |",
               "| 44    | d       |         |",
               "+-------+---------+---------+",
   
   DataFusion UT
   
   ````
   async fn error_cross_join() {
       let test_repartition_joins = vec![true, false];
       for repartition_joins in test_repartition_joins {
           let ctx = create_join_context("t1_id", "t2_id", repartition_joins).unwrap();
   
           let sql = "SELECT t1_id, t1_name, t2_name FROM t1 LEFT JOIN t2 ON (t1_id != t2_id and t2_id >= 100) ORDER BY t1_id";
           let actual = execute_to_batches(&ctx, sql).await;
           let expected = vec![
               "+-------+---------+---------+",
               "| t1_id | t1_name | t2_name |",
               "+-------+---------+---------+",
               "| 11    | a       |         |",
               "| 22    | b       |         |",
               "| 33    | c       |         |",
               "| 44    | d       |         |",
               "+-------+---------+---------+",
           ];
   
           assert_batches_eq!(expected, &actual);
       }
   }
   
   actual:
   
   [
       "++",
       "++",
   ]
   
   ````
   **To Reproduce**
   Steps to reproduce the behavior:
   
   **Expected behavior**
   A clear and concise description of what you expected to happen.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #4363:
URL: https://github.com/apache/arrow-datafusion/issues/4363#issuecomment-1328659301

   I would prefer to keep a different join implementation.  A different physical plan like 'nested_loop_join' will deliver the clear information to users and developers that the plan is actually a nested loop join.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] liukun4515 commented on issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #4363:
URL: https://github.com/apache/arrow-datafusion/issues/4363#issuecomment-1328158438

   > I think it is also possible to extend the HashJoin implementation with the new semantics rather than adding an entirely new physical join implementation
   
   hash join can just support equal condition. If there is no the equal condition, we can't use the hash join.
   
   If the condition is ` on left_table.column1 > 10` which can't be used for hash.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions

Posted by GitBox <gi...@apache.org>.
alamb closed issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions 
URL: https://github.com/apache/arrow-datafusion/issues/4363


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on issue #4363: Should not covert a normal non-inner join to Cross Join when there are non-equal Join conditions

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #4363:
URL: https://github.com/apache/arrow-datafusion/issues/4363#issuecomment-1327062323

   Except for Inner Join, we can not treat join conditions as filter conditions(there are some cases join conditions can be pushed down, depends on the join expr and join type). Need to introduce a new Physical Join implementation `nested_loop_join` to fix the issues.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #4363:
URL: https://github.com/apache/arrow-datafusion/issues/4363#issuecomment-1328239347

   > hash join can just support equal condition. If there is no the equal condition, we can't use the hash join.
   
   > If the condition is on left_table.column1 > 10 which can't be used for hash.
   
   I was thinking something like  you could use a hash table with a single entry when there are no join keys -- which will effectively devolve into a CrossJoin 
   
   What I really would hope to avoid is another join implementation if possible -- Joins are already complicated enough, and so adding another one will not improve the situation. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #4363: Should not convert a normal non-inner join to Cross Join when there are non-equal Join conditions

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #4363:
URL: https://github.com/apache/arrow-datafusion/issues/4363#issuecomment-1328043376

   I think it is also possible to extend the HashJoin implementation with the new semantics rather than adding an entirely new physical join implementation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org