You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/08 09:03:26 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new pull request #291: Support Full Join

Dandandan opened a new pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291


   # Which issue does this PR close?
   
   Closes #287
   
    # Rationale for this change
   
   Full join completes the support for the basic join types in DataFusion.
   
   # What changes are included in this PR?
   
   Add support for full join, via support for left/right join.
   
   # Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#issuecomment-835265123


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#291](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1d89dfd) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/a947f11a8c0558c7301931a6c35a07de396d2463?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a947f11) will **decrease** coverage by `0.05%`.
   > The diff coverage is `88.75%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/291/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #291      +/-   ##
   ==========================================
   - Coverage   76.16%   76.11%   -0.06%     
   ==========================================
     Files         140      140              
     Lines       23603    23705     +102     
   ==========================================
   + Hits        17978    18043      +65     
   - Misses       5625     5662      +37     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `40.45% <0.00%> (-0.06%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `67.96% <0.00%> (-0.10%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `47.34% <0.00%> (-0.22%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.94% <0.00%> (-0.17%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `82.01% <ø> (ø)` | |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `62.06% <0.00%> (-0.54%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `87.79% <92.98%> (+0.57%)` | :arrow_up: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.99% <100.00%> (ø)` | |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `100.00% <100.00%> (ø)` | |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.40% <100.00%> (+0.03%)` | :arrow_up: |
   | ... and [3 more](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a947f11...1d89dfd](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan merged pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan merged pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#discussion_r628874887



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1515,6 +1589,43 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn join_outer_one() -> Result<()> {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table(
+            ("a2", &vec![10, 20, 30]),
+            ("b1", &vec![4, 5, 6]),
+            ("c2", &vec![70, 80, 90]),
+        );
+        let on = &[("b1", "b1")];
+
+        let join = join(left, right, on, &JoinType::Full)?;
+
+        let columns = columns(&join.schema());
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+
+        let stream = join.execute(0).await?;
+        let batches = common::collect(stream).await?;
+
+        let expected = vec![
+            "+----+----+----+----+----+",
+            "| a1 | b1 | c1 | a2 | c2 |",
+            "+----+----+----+----+----+",
+            "|    |    |    | 30 | 90 |",

Review comment:
       this row should also have `b1=6` I think

##########
File path: ballista/rust/core/src/serde/physical_plan/to_proto.rs
##########
@@ -132,6 +132,7 @@ impl TryInto<protobuf::PhysicalPlanNode> for Arc<dyn ExecutionPlan> {
                 JoinType::Inner => protobuf::JoinType::Inner,
                 JoinType::Left => protobuf::JoinType::Left,
                 JoinType::Right => protobuf::JoinType::Right,
+                JoinType::Full => protobuf::JoinType::Right,

Review comment:
       This looks like it is a typo perhaps
   
   ```suggestion
                   JoinType::Full => protobuf::JoinType::Full,
   ```

##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1410,6 +1413,45 @@ mod tests {
         assert_batches_sorted_eq!(expected, &batches);
     }
 
+    #[tokio::test]
+    async fn join_outer_multi_batch() {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table_two_batches(
+            ("a2", &vec![10, 20, 30]),
+            ("b1", &vec![4, 5, 6]),
+            ("c2", &vec![70, 80, 90]),
+        );
+        let on = &[("b1", "b1")];
+
+        let join = join(left, right, on, &JoinType::Full).unwrap();
+
+        let columns = columns(&join.schema());
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+
+        let stream = join.execute(0).await.unwrap();
+        let batches = common::collect(stream).await.unwrap();
+
+        let expected = vec![

Review comment:
       This doesn't look like the correct answer to me. FULL OUTER JOIN basically keeps tuples from boths sides that don't match the join criteria.
   
   So we should have 2 rows for `b1=4' and `b1=5` and then 1 row for tuple on the left that has no match with the right, `b1=7` and 1 row for the tuple on the right that has no match with the left `b1=6`
   
   And indeed  postgres returns 4 rows as well
   
   
   ```
   alamb=# select * from tleft;
    a1 | b1 | c1 
   ----+----+----
     1 |  4 |  7
     2 |  5 |  8
     3 |  7 |  9
   (3 rows)
   
   alamb=# select * from tright;
    a2 | b2 | c2 
   ----+----+----
    10 |  4 | 70
     2 |  5 | 80
     3 |  6 | 90
   (3 rows)
   
   alamb=# select * from tleft FULL JOIN tright ON (tleft.b1 = tright.b2);
    a1 | b1 | c1 | a2 | b2 | c2 
   ----+----+----+----+----+----
     1 |  4 |  7 | 10 |  4 | 70
     2 |  5 |  8 |  2 |  5 | 80
       |    |    |  3 |  6 | 90
     3 |  7 |  9 |    |    |   
   (4 rows)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#discussion_r628886863



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1515,6 +1589,43 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn join_outer_one() -> Result<()> {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table(
+            ("a2", &vec![10, 20, 30]),
+            ("b1", &vec![4, 5, 6]),
+            ("c2", &vec![70, 80, 90]),
+        );
+        let on = &[("b1", "b1")];
+
+        let join = join(left, right, on, &JoinType::Full)?;
+
+        let columns = columns(&join.schema());
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+
+        let stream = join.execute(0).await?;
+        let batches = common::collect(stream).await?;
+
+        let expected = vec![
+            "+----+----+----+----+----+",
+            "| a1 | b1 | c1 | a2 | c2 |",
+            "+----+----+----+----+----+",
+            "|    |    |    | 30 | 90 |",

Review comment:
       I changed the test to use `b2` as column so you can see better what's going on:
   
   ```
               "+----+----+----+----+----+----+",
               "| a1 | b1 | c1 | a2 | b2 | c2 |",
               "+----+----+----+----+----+----+",
               "|    |    |    | 30 | 6  | 90 |",
               "| 1  | 4  | 7  | 10 | 4  | 70 |",
               "| 2  | 5  | 8  | 20 | 5  | 80 |",
               "| 3  | 7  | 9  |    |    |    |",
               "+----+----+----+----+----+----+",
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#discussion_r628876286



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1410,6 +1413,45 @@ mod tests {
         assert_batches_sorted_eq!(expected, &batches);
     }
 
+    #[tokio::test]
+    async fn join_outer_multi_batch() {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table_two_batches(
+            ("a2", &vec![10, 20, 30]),
+            ("b1", &vec![4, 5, 6]),
+            ("c2", &vec![70, 80, 90]),
+        );
+        let on = &[("b1", "b1")];
+
+        let join = join(left, right, on, &JoinType::Full).unwrap();
+
+        let columns = columns(&join.schema());
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+
+        let stream = join.execute(0).await.unwrap();
+        let batches = common::collect(stream).await.unwrap();
+
+        let expected = vec![

Review comment:
       The test here tests having two (equivalent) batches at the right side.
   So they will produce two rows for the right rows and one for the left row that doesnt match the right row.
   `tright` will look like this in your example:
   ```
    a2 | b2 | c2 
   ----+----+----
    10 |  4 | 70
    10 |  4 | 70
     2 |  5 | 80
     2 |  5 | 80
     3 |  6 | 90
     3 |  6 | 90
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#issuecomment-835803357


   @alamb I updated the tests and addressed your comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#issuecomment-835265123


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#291](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e617602) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/a947f11a8c0558c7301931a6c35a07de396d2463?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a947f11) will **decrease** coverage by `0.13%`.
   > The diff coverage is `88.75%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/291/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #291      +/-   ##
   ==========================================
   - Coverage   76.16%   76.03%   -0.14%     
   ==========================================
     Files         140      141       +1     
     Lines       23603    23730     +127     
   ==========================================
   + Hits        17978    18043      +65     
   - Misses       5625     5687      +62     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `40.45% <0.00%> (-0.06%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `67.96% <0.00%> (-0.10%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `47.34% <0.00%> (-0.22%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.94% <0.00%> (-0.17%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `82.01% <ø> (ø)` | |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `62.06% <0.00%> (-0.54%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `87.79% <92.98%> (+0.57%)` | :arrow_up: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.99% <100.00%> (ø)` | |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `100.00% <100.00%> (ø)` | |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.40% <100.00%> (+0.03%)` | :arrow_up: |
   | ... and [4 more](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a947f11...e617602](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#discussion_r628887212



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1410,6 +1413,45 @@ mod tests {
         assert_batches_sorted_eq!(expected, &batches);
     }
 
+    #[tokio::test]
+    async fn join_outer_multi_batch() {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table_two_batches(
+            ("a2", &vec![10, 20, 30]),
+            ("b1", &vec![4, 5, 6]),
+            ("c2", &vec![70, 80, 90]),
+        );
+        let on = &[("b1", "b1")];
+
+        let join = join(left, right, on, &JoinType::Full).unwrap();
+
+        let columns = columns(&join.schema());
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+
+        let stream = join.execute(0).await.unwrap();
+        let batches = common::collect(stream).await.unwrap();
+
+        let expected = vec![

Review comment:
       To me the results look correct given two equivalent right batches. I included `b2` as column here as well to make it a bit more obvious:
   
   ```
               "+----+----+----+----+----+----+",
               "| a1 | b1 | c1 | a2 | b2 | c2 |",
               "+----+----+----+----+----+----+",
               "|    |    |    | 30 | 6  | 90 |",
               "|    |    |    | 30 | 6  | 90 |",
               "| 1  | 4  | 7  | 10 | 4  | 70 |",
               "| 1  | 4  | 7  | 10 | 4  | 70 |",
               "| 2  | 5  | 8  | 20 | 5  | 80 |",
               "| 2  | 5  | 8  | 20 | 5  | 80 |",
               "| 3  | 7  | 9  |    |    |    |",
               "+----+----+----+----+----+----+",
   ```
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#issuecomment-835265123


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#291](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (21d8532) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/a947f11a8c0558c7301931a6c35a07de396d2463?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a947f11) will **decrease** coverage by `0.13%`.
   > The diff coverage is `89.15%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/291/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #291      +/-   ##
   ==========================================
   - Coverage   76.16%   76.03%   -0.14%     
   ==========================================
     Files         140      141       +1     
     Lines       23603    23733     +130     
   ==========================================
   + Hits        17978    18046      +68     
   - Misses       5625     5687      +62     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `40.45% <0.00%> (-0.06%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `67.96% <0.00%> (-0.10%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `47.34% <0.00%> (-0.22%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.94% <0.00%> (-0.17%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `82.01% <ø> (ø)` | |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `62.06% <0.00%> (-0.54%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `87.79% <92.98%> (+0.57%)` | :arrow_up: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.99% <100.00%> (ø)` | |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `100.00% <100.00%> (ø)` | |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.40% <100.00%> (+0.03%)` | :arrow_up: |
   | ... and [4 more](https://codecov.io/gh/apache/arrow-datafusion/pull/291/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [a947f11...21d8532](https://codecov.io/gh/apache/arrow-datafusion/pull/291?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#discussion_r628876661



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1515,6 +1589,43 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn join_outer_one() -> Result<()> {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table(
+            ("a2", &vec![10, 20, 30]),
+            ("b1", &vec![4, 5, 6]),
+            ("c2", &vec![70, 80, 90]),
+        );
+        let on = &[("b1", "b1")];
+
+        let join = join(left, right, on, &JoinType::Full)?;
+
+        let columns = columns(&join.schema());
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+
+        let stream = join.execute(0).await?;
+        let batches = common::collect(stream).await?;
+
+        let expected = vec![
+            "+----+----+----+----+----+",
+            "| a1 | b1 | c1 | a2 | c2 |",
+            "+----+----+----+----+----+",
+            "|    |    |    | 30 | 90 |",

Review comment:
       The `b1` column is here from the left side. I can make the test a bit easier to follow.
   
   I believe this should be covered by the work that @houqp is working on to support qualified names in tables (and disambiguating using the names).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#discussion_r628876286



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1410,6 +1413,45 @@ mod tests {
         assert_batches_sorted_eq!(expected, &batches);
     }
 
+    #[tokio::test]
+    async fn join_outer_multi_batch() {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table_two_batches(
+            ("a2", &vec![10, 20, 30]),
+            ("b1", &vec![4, 5, 6]),
+            ("c2", &vec![70, 80, 90]),
+        );
+        let on = &[("b1", "b1")];
+
+        let join = join(left, right, on, &JoinType::Full).unwrap();
+
+        let columns = columns(&join.schema());
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+
+        let stream = join.execute(0).await.unwrap();
+        let batches = common::collect(stream).await.unwrap();
+
+        let expected = vec![

Review comment:
       The test here tests having two (equivalent) batches at the right side.
   So they will produce two rows for the right rows and one for the left row that doesnt match the right row.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#issuecomment-836520864


   Thanks all for the reviews! :+1::+1::+1:
   
   @jorgecarleitao yeah really happy that this only involved adding some pattern matches and we could just reuse the code for right / left join!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #291: Support Full Join

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #291:
URL: https://github.com/apache/arrow-datafusion/pull/291#discussion_r629226166



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1590,36 +1591,36 @@ mod tests {
     }
 
     #[tokio::test]
-    async fn join_outer_one() -> Result<()> {
+    async fn join_full_one() -> Result<()> {
         let left = build_table(
             ("a1", &vec![1, 2, 3]),
             ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
             ("c1", &vec![7, 8, 9]),
         );
         let right = build_table(
             ("a2", &vec![10, 20, 30]),
-            ("b1", &vec![4, 5, 6]),
+            ("b2", &vec![4, 5, 6]),
             ("c2", &vec![70, 80, 90]),
         );
-        let on = &[("b1", "b1")];
+        let on = &[("b1", "b2")];
 
         let join = join(left, right, on, &JoinType::Full)?;
 
         let columns = columns(&join.schema());
-        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "c2"]);
+        assert_eq!(columns, vec!["a1", "b1", "c1", "a2", "b2", "c2"]);
 
         let stream = join.execute(0).await?;
         let batches = common::collect(stream).await?;
 
         let expected = vec![
-            "+----+----+----+----+----+",
-            "| a1 | b1 | c1 | a2 | c2 |",
-            "+----+----+----+----+----+",
-            "|    |    |    | 30 | 90 |",
-            "| 1  | 4  | 7  | 10 | 70 |",
-            "| 2  | 5  | 8  | 20 | 80 |",
-            "| 3  | 7  | 9  |    |    |",
-            "+----+----+----+----+----+",
+            "+----+----+----+----+----+----+",

Review comment:
       I see -- the output was only showing `b1` from `left` -- that was very confusing. Thank you for making it clearer

##########
File path: datafusion/tests/sql.rs
##########
@@ -1342,6 +1342,11 @@ async fn outer_join() -> Result<()> {
         vec!["44", "d", "x"],
     ];
     assert_eq!(expected, actual);
+
+    let sql = "SELECT t1_id, t1_name, t2_name FROM t1 FULL OUTER JOIN t2 ON t1_id = t2_id ORDER BY t1_id";

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org