You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/22 10:27:04 UTC

[GitHub] [arrow-datafusion] mrob95 opened a new pull request, #2765: Rewrite subexpressions of InSubquery in rewrite_expression

mrob95 opened a new pull request, #2765:
URL: https://github.com/apache/arrow-datafusion/pull/2765

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #2736.
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Looking at the expressions mentioned in #2725, I think the only one that needs fixing is `Expr::InSubquery`. `Exists` and `ScalarSubquery` both have subqueries but not subexpressions, and it looks like subqueries are currently treated as completely separate plans for the purposes of optimisation. I'm new to working on query engines (and rust, for that matter!) though so may be misinterpreting this.
   
   I've modified `rewrite_expression` so `InSubquery` exprs use the subexpression passed in `expressions` rather than just returning a clone, and added a pushdown test with an `InSubquery` predicate which uses an aliased column.
   
   Additionally, I noticed while I was creating the test case that `InSubquery` filters were not being pushed down at all. This was because the subexpressions of `InSubquery` were not being visited by `ExpressionVisitor`, so the column in `InSubquery` predicates was not found. Fixed with a simple modification to `expr_visitor.rs`. The test case I added covers this as well.
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove merged pull request #2765: Rewrite subexpressions of InSubquery in rewrite_expression

Posted by GitBox <gi...@apache.org>.
andygrove merged PR #2765:
URL: https://github.com/apache/arrow-datafusion/pull/2765


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mrob95 commented on pull request #2765: Rewrite subexpressions of InSubquery in rewrite_expression

Posted by GitBox <gi...@apache.org>.
mrob95 commented on PR #2765:
URL: https://github.com/apache/arrow-datafusion/pull/2765#issuecomment-1163136906

   > Bonus point for removing expr_expressions and rewrite_expression and instead use an ExprMutator
   
   Thanks, going to see if I can do this ^ as well, but that can be a separate PR :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2765: Rewrite subexpressions of InSubquery in rewrite_expression

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2765:
URL: https://github.com/apache/arrow-datafusion/pull/2765#discussion_r903775768


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -2045,4 +2045,37 @@ mod tests {
 
         Ok(())
     }
+
+    #[test]
+    fn test_in_subquery_with_alias() -> Result<()> {
+        // in table scan the true col name is 'test.a',
+        // but we rename it as 'b', and use col 'b' in subquery filter
+        let table_scan = test_table_scan()?;
+        let table_scan_sq = test_table_scan_with_name("sq")?;
+        let subplan = Arc::new(
+            LogicalPlanBuilder::from(table_scan_sq)
+                .project(vec![col("c")])?
+                .build()?,
+        );
+        let plan = LogicalPlanBuilder::from(table_scan)
+            .project(vec![col("a").alias("b"), col("c")])?
+            .filter(in_subquery(col("b"), subplan))?
+            .build()?;
+
+        // filter on col b in subquery
+        let expected_before = "\
+        Filter: #b IN (Subquery: Projection: #sq.c\n  TableScan: sq projection=None)\
+        \n  Projection: #test.a AS b, #test.c\
+        \n    TableScan: test projection=None";
+        assert_eq!(format!("{:?}", plan), expected_before);
+
+        // rewrite filter col b to test.a

Review Comment:
   👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2765: Rewrite subexpressions of InSubquery in rewrite_expression

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2765:
URL: https://github.com/apache/arrow-datafusion/pull/2765#issuecomment-1162951190

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2765?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2765](https://codecov.io/gh/apache/arrow-datafusion/pull/2765?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (db47b2a) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/bc007766f8f83bc3f720c6bafbd979e9335bb432?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (bc00776) will **increase** coverage by `0.00%`.
   > The diff coverage is `89.47%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #2765   +/-   ##
   =======================================
     Coverage   84.95%   84.96%           
   =======================================
     Files         271      271           
     Lines       48053    48071   +18     
   =======================================
   + Hits        40824    40842   +18     
     Misses       7229     7229           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2765?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/expr/src/expr\_visitor.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2765/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9leHByX3Zpc2l0b3IucnM=) | `63.75% <0.00%> (ø)` | |
   | [datafusion/optimizer/src/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2765/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2ZpbHRlcl9wdXNoX2Rvd24ucnM=) | `98.23% <92.85%> (-0.10%)` | :arrow_down: |
   | [datafusion/optimizer/src/utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2765/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL3V0aWxzLnJz) | `36.08% <100.00%> (+2.39%)` | :arrow_up: |
   | [datafusion/expr/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2765/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `73.71% <0.00%> (-0.20%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2765?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2765?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [bc00776...db47b2a](https://codecov.io/gh/apache/arrow-datafusion/pull/2765?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org