You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/01 15:33:37 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new pull request #470: Support semi join

Dandandan opened a new pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #461 
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Adds semi join support. This will enable us to rewrite `WHERE id IN (..)` and `WHERE exists` to a join.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Add semi join to hash join implementation.
   What's not in here
   
   * Semi join sql / dataframe syntax
   * Using the semi join in the planner for in/exists
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   No
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-852238261


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#470](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9cb007a) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/c8ab5a4f00fc8b362eed72d5feb43b03b8ad1fdd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8ab5a4) will **increase** coverage by `0.50%`.
   > The diff coverage is `77.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/470/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #470      +/-   ##
   ==========================================
   + Coverage   75.30%   75.80%   +0.50%     
   ==========================================
     Files         152      153       +1     
     Lines       25275    25924     +649     
   ==========================================
   + Hits        19033    19652     +619     
   - Misses       6242     6272      +30     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `36.17% <0.00%> (-0.05%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `62.32% <0.00%> (-0.08%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `39.63% <0.00%> (-0.15%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.46% <0.00%> (-0.16%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.08% <0.00%> (-0.39%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `80.62% <ø> (+0.52%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.32% <0.00%> (-0.14%)` | :arrow_down: |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `58.53% <20.00%> (-1.97%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `86.45% <97.29%> (+0.39%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `98.50% <100.00%> (+0.02%)` | :arrow_up: |
   | ... and [18 more](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c8ab5a4...9cb007a](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644334211



##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       I rewrote it a bit to use a function in the body




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-852238261


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#470](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ff1388f) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/c8ab5a4f00fc8b362eed72d5feb43b03b8ad1fdd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8ab5a4) will **increase** coverage by `0.54%`.
   > The diff coverage is `82.60%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/470/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #470      +/-   ##
   ==========================================
   + Coverage   75.30%   75.85%   +0.54%     
   ==========================================
     Files         152      153       +1     
     Lines       25275    25910     +635     
   ==========================================
   + Hits        19033    19653     +620     
   - Misses       6242     6257      +15     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `36.17% <0.00%> (-0.05%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `62.32% <0.00%> (-0.08%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `39.63% <0.00%> (-0.15%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.46% <0.00%> (-0.16%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.08% <0.00%> (-0.39%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `80.62% <ø> (+0.52%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.32% <0.00%> (-0.14%)` | :arrow_down: |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `60.83% <50.00%> (+0.32%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `86.45% <97.29%> (+0.39%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `98.50% <100.00%> (+0.02%)` | :arrow_up: |
   | ... and [16 more](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c8ab5a4...ff1388f](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-852238261


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#470](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (592fd83) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/c8ab5a4f00fc8b362eed72d5feb43b03b8ad1fdd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8ab5a4) will **increase** coverage by `0.12%`.
   > The diff coverage is `82.60%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/470/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #470      +/-   ##
   ==========================================
   + Coverage   75.30%   75.42%   +0.12%     
   ==========================================
     Files         152      153       +1     
     Lines       25275    25717     +442     
   ==========================================
   + Hits        19033    19397     +364     
   - Misses       6242     6320      +78     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `36.17% <0.00%> (-0.05%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `62.32% <0.00%> (-0.08%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `39.63% <0.00%> (-0.15%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.46% <0.00%> (-0.16%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.08% <0.00%> (-0.39%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `80.10% <ø> (ø)` | |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.32% <0.00%> (-0.14%)` | :arrow_down: |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `60.83% <50.00%> (+0.32%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `86.45% <97.29%> (+0.39%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `98.50% <100.00%> (+0.02%)` | :arrow_up: |
   | ... and [5 more](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c8ab5a4...592fd83](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-853266206


   > should the readme be updated to include semi join as well?
   
   I think for now we don't - I would say that at least need to have `in` or `exists` to be mapped to a semi join, as currently there is not really a way for anyone to create it (other than creating a `LogicalPlan` explicitly).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644268005



##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1666,6 +1689,41 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn join_semi() -> Result<()> {
+        let left = build_table(
+            ("a1", &vec![1, 2, 3]),
+            ("b1", &vec![4, 5, 7]), // 7 does not exist on the right
+            ("c1", &vec![7, 8, 9]),
+        );
+        let right = build_table(

Review comment:
       It may be worth testing with record batches that have duplicates on the `left` as well (to ensure the duplicates on the left are produced) ?

##########
File path: datafusion/src/physical_plan/hash_join.rs
##########
@@ -1109,22 +1116,34 @@ pub fn create_hashes<'a>(
 }
 
 // Produces a batch for left-side rows that are not marked as being visited during the whole join

Review comment:
       ```suggestion
   // Produces a batch for left-side rows that have/have not been matched during the whole join
   ```

##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       should you also possibly conditionalize this on being a SemiJoin? like
   
   ```suggestion
               | LogicalPlan::Join if *join_type == JoinType::Semi { .. }
   ```
   
   So if we add a new join type in the future it doesn't accidentally get allowed / disallowed in some confusing way
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644354781



##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -106,6 +106,13 @@ fn should_swap_join_order(left: &LogicalPlan, right: &LogicalPlan) -> bool {
     }
 }
 
+fn supports_swap(join_type: JoinType) -> bool {

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644287511



##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       Tried that in earlier commit... But than the exhaustiveness checking gets in the way unfortunately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644354781



##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -106,6 +106,13 @@ fn should_swap_join_order(left: &LogicalPlan, right: &LogicalPlan) -> bool {
     }
 }
 
+fn supports_swap(join_type: JoinType) -> bool {

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644331235



##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       I think the only way the compiler can do it now is by explicitly matching on all the join types, but as this feature is not added https://github.com/rust-lang/rfcs/blob/master/text/2535-or-patterns.md that would be quite verbose currently

##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       I rewrote it a bit to use a function in the body

##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       I rewrote it a bit to use a function in the body of the join match




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-852238261


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#470](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9cb007a) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/c8ab5a4f00fc8b362eed72d5feb43b03b8ad1fdd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8ab5a4) will **increase** coverage by `0.50%`.
   > The diff coverage is `77.55%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/470/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #470      +/-   ##
   ==========================================
   + Coverage   75.30%   75.80%   +0.50%     
   ==========================================
     Files         152      153       +1     
     Lines       25275    25924     +649     
   ==========================================
   + Hits        19033    19652     +619     
   - Misses       6242     6272      +30     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `36.17% <0.00%> (-0.05%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `62.32% <0.00%> (-0.08%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `39.63% <0.00%> (-0.15%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.46% <0.00%> (-0.16%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.08% <0.00%> (-0.39%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `80.62% <ø> (+0.52%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.32% <0.00%> (-0.14%)` | :arrow_down: |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `58.53% <20.00%> (-1.97%)` | :arrow_down: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `86.45% <97.29%> (+0.39%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `98.50% <100.00%> (+0.02%)` | :arrow_up: |
   | ... and [18 more](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c8ab5a4...9cb007a](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-853254367


   should the readme be updated to include semi join as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-853266206


   > should the readme be updated to include semi join as well?
   
   I think for now we don't - I would say that at least need to have `in` or exists to be mapped to a semi join, as currently there is not really a way for anyone to create it (other than creating a `LogicalPlan` explicitly).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-852533237


   I plan to review this tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644331235



##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       I think the only way the compiler can do it now is by explicitly matching on all the join types, but as this feature is not added https://github.com/rust-lang/rfcs/blob/master/text/2535-or-patterns.md that would be quite verbose currently




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-852238261


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#470](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bd9e4a8) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/c8ab5a4f00fc8b362eed72d5feb43b03b8ad1fdd?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c8ab5a4) will **increase** coverage by `0.11%`.
   > The diff coverage is `80.48%`.
   
   > :exclamation: Current head bd9e4a8 differs from pull request most recent head 8c8ecf7. Consider uploading reports for the commit 8c8ecf7 to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/470/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #470      +/-   ##
   ==========================================
   + Coverage   75.30%   75.42%   +0.11%     
   ==========================================
     Files         152      153       +1     
     Lines       25275    25715     +440     
   ==========================================
   + Hits        19033    19395     +362     
   - Misses       6242     6320      +78     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...sta/rust/core/src/serde/logical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vZnJvbV9wcm90by5ycw==) | `36.17% <0.00%> (-0.05%)` | :arrow_down: |
   | [...lista/rust/core/src/serde/logical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9sb2dpY2FsX3BsYW4vdG9fcHJvdG8ucnM=) | `62.32% <0.00%> (-0.08%)` | :arrow_down: |
   | [...ta/rust/core/src/serde/physical\_plan/from\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL2Zyb21fcHJvdG8ucnM=) | `39.63% <0.00%> (-0.15%)` | :arrow_down: |
   | [...ista/rust/core/src/serde/physical\_plan/to\_proto.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YmFsbGlzdGEvcnVzdC9jb3JlL3NyYy9zZXJkZS9waHlzaWNhbF9wbGFuL3RvX3Byb3RvLnJz) | `50.46% <0.00%> (-0.16%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL2J1aWxkZXIucnM=) | `90.08% <0.00%> (-0.39%)` | :arrow_down: |
   | [datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvbG9naWNhbF9wbGFuL3BsYW4ucnM=) | `80.10% <ø> (ø)` | |
   | [datafusion/src/physical\_plan/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9wbGFubmVyLnJz) | `80.32% <0.00%> (-0.14%)` | :arrow_down: |
   | [datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvb3B0aW1pemVyL2hhc2hfYnVpbGRfcHJvYmVfb3JkZXIucnM=) | `60.83% <50.00%> (+0.32%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX2pvaW4ucnM=) | `86.41% <96.87%> (+0.35%)` | :arrow_up: |
   | [datafusion/src/physical\_plan/hash\_utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9oYXNoX3V0aWxzLnJz) | `98.50% <100.00%> (+0.02%)` | :arrow_up: |
   | ... and [5 more](https://codecov.io/gh/apache/arrow-datafusion/pull/470/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [c8ab5a4...8c8ecf7](https://codecov.io/gh/apache/arrow-datafusion/pull/470?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#issuecomment-853247646


   Related PR for anti join here: https://github.com/apache/arrow-datafusion/pull/482


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470#discussion_r644334211



##########
File path: datafusion/src/optimizer/hash_build_probe_order.rs
##########
@@ -187,6 +187,7 @@ impl OptimizerRule for HashBuildProbeOrder {
             | LogicalPlan::CreateExternalTable { .. }
             | LogicalPlan::Explain { .. }
             | LogicalPlan::Union { .. }
+            | LogicalPlan::Join { .. }

Review comment:
       I rewrote it a bit to use a function in the body of the join match




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb merged pull request #470: Support semi join

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #470:
URL: https://github.com/apache/arrow-datafusion/pull/470


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org