You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/15 21:53:46 UTC

[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request, #2729: fix: filter push down with `InList` expressions

Ted-Jiang opened a new pull request, #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729

   # Which issue does this PR close?
   
   Closes #2725 .
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r897613695


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -1831,4 +1831,218 @@ mod tests {
 
         Ok(())
     }
+
+    #[test]
+    fn test_filter_with_alias() -> Result<()> {

Review Comment:
   I think we should keep these tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1156773298

   The changes look good but I think the title should be updated to "Support filter push down with InList expressions"?
   
   It looks like we will also need to implement a similar change for `InSubquery` expressions so would be good to file an issue for that as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r897290860


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -122,6 +122,27 @@ fn remove_filters(
         .collect::<Vec<_>>()
 }
 
+// rename all filter columns which have alias name
+fn rename_filters_column_name(
+    filters: &mut [Predicate],
+    alias_cols_expr_and_name: &HashMap<&Expr, &String>,
+) {
+    for (expr, columns) in filters {
+        if alias_cols_expr_and_name.contains_key(expr) {
+            let col_string = <&std::string::String>::clone(
+                alias_cols_expr_and_name.get(expr).unwrap(),
+            );

Review Comment:
   ```suggestion
               let col_string = alias_cols_expr_and_name.get(expr).unwrap();
   ```
   
   Seemed to work for me locally



##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -336,6 +357,18 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> Result<LogicalPlan> {
         LogicalPlan::Analyze { .. } => push_down(&state, plan),
         LogicalPlan::Filter(Filter { input, predicate }) => {
             let mut predicates = vec![];
+            let mut alias_cols_expr_and_name = HashMap::new();
+            //Need rewrite column name before push down
+            let input_plan = &*input.clone();
+            if let LogicalPlan::Projection(projection) = input_plan {

Review Comment:
   I wonder if you can use the `LogicalPlan`'s output schema to detect the case you are looking for
   
   I am still a little confused about where the issue is -- is the filter push down logic adding an extra alias? Or is it not adding an alias it should?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1155942916

   Got the bug! :[add fail test](https://github.com/apache/arrow-datafusion/pull/2729/commits/b40a85bcf2c2bca97777cd33d9889ed6f5d1a72c)
    not adding an alias it should in `IN` filter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #2729: fix: filter push down with `InList` expressions

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1156578104

   I am still planning on reviewing this and will hopefully have time today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r898457063


##########
datafusion/optimizer/src/utils.rs:
##########
@@ -276,9 +276,13 @@ pub fn rewrite_expression(expr: &Expr, expressions: &[Expr]) -> Result<Expr> {
         }
         Expr::Not(_) => Ok(Expr::Not(Box::new(expressions[0].clone()))),
         Expr::Negative(_) => Ok(Expr::Negative(Box::new(expressions[0].clone()))),
+        Expr::InList { list, negated, .. } => Ok(Expr::InList {
+            expr: Box::new(expressions[0].clone()),
+            list: list.clone(),
+            negated: *negated,
+        }),
         Expr::Column(_)
         | Expr::Literal(_)
-        | Expr::InList { .. }
         | Expr::Exists { .. }

Review Comment:
   🤔  looks like `Exists` and `Subquery` may need the same treatment (not in this PR though).... I'll file a follow on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r897490349


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -1831,4 +1870,111 @@ mod tests {
 
         Ok(())
     }
+

Review Comment:
   @alamb  you are right! without the code change the ut still passed
   
   > I am still a little confused about where the issue is -- is the filter push down logic adding an extra alias? Or is it not adding an alias it should?
   
   I think it is `not` adding an alias it should,  but the ut show it doesn't 😂
   
   But I got a log from send logical plan to ballista-scheduler
   ```
   After apply projection_push_down rule:
   
   Optimized logical plan:
   Limit: 50000
     Projection: #LINEORDER.LO_SHIPMODE, #ASS
       Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #KYLINAPPROXCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_,UInt8(10)) AS ASS
         Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[KYLINAPPROXCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, UInt8(10))]]
           Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
     hi      Filter: #LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])
               Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
    alias         Projection: #ssb@kylin_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@kylin_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
                   TableScan: ssb@kylin_udaf_cube_update@17179869183 projection=Some([12, 36])
   
   After apply filter_push_down rule:
   
   Optimized logical plan:
   Limit: 50000
     Projection: #LINEORDER.LO_SHIPMODE, #ASS
       Projection: #LO_SHIPMODE AS LINEORDER.LO_SHIPMODE, #KYLINAPPROXCOUNTDISTINCT(_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_,UInt8(10)) AS ASS
         Aggregate: groupBy=[[#LO_SHIPMODE]], aggr=[[KYLINAPPROXCOUNTDISTINCT(#_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_, UInt8(10))]]
           Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
             Projection: #LO_SHIPMODE, #_KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
     alias     Projection: #ssb@kylin_udaf_cube_update@17179869183.21 AS LO_SHIPMODE, #ssb@kylin_udaf_cube_update@17179869183.41 AS _KY_COUNT_DISTINCT_LINEORDER_LO_SHIPPRIORITY_LINEORDER_LO_SUPPKEY_
     hi          Filter: #LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])
                   TableScan: ssb@kylin_udaf_cube_update@17179869183 projection=Some([12, 36]), partial_filters=[#LO_SHIPMODE IN ([Utf8("SHIP"), Utf8("Rail"), Utf8("2321"), Utf8("MAIL")])]
   
   ```
   It show not adding an alias it should 🤔, in our internal version datafusion may lost some commit, but add this pr ut still passed😭
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1154903331

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2729?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2729](https://codecov.io/gh/apache/arrow-datafusion/pull/2729?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0a386ce) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/ed0fe8cb19a1a6477a2821d08bcd520c3c2b9441?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ed0fe8c) will **increase** coverage by `0.17%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #2729      +/-   ##
   ==========================================
   + Coverage   84.70%   84.88%   +0.17%     
   ==========================================
     Files         270      270              
     Lines       47262    47768     +506     
   ==========================================
   + Hits        40035    40549     +514     
   + Misses       7227     7219       -8     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2729?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/expr/src/expr.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9leHByLnJz) | `86.09% <ø> (+4.58%)` | :arrow_up: |
   | [datafusion/optimizer/src/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL2ZpbHRlcl9wdXNoX2Rvd24ucnM=) | `98.27% <100.00%> (+0.11%)` | :arrow_up: |
   | [...atafusion/core/src/physical\_plan/aggregates/mod.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9wbGFuL2FnZ3JlZ2F0ZXMvbW9kLnJz) | `91.16% <0.00%> (-3.44%)` | :arrow_down: |
   | [datafusion/expr/src/expr\_fn.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9leHByX2ZuLnJz) | `88.23% <0.00%> (-3.23%)` | :arrow_down: |
   | [datafusion/core/tests/dataframe.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL2RhdGFmcmFtZS5ycw==) | `98.62% <0.00%> (-1.38%)` | :arrow_down: |
   | [datafusion/expr/src/utils.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy91dGlscy5ycw==) | `90.80% <0.00%> (-0.39%)` | :arrow_down: |
   | [datafusion/core/tests/sql/aggregates.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL3NxbC9hZ2dyZWdhdGVzLnJz) | `99.27% <0.00%> (-0.10%)` | :arrow_down: |
   | [...afusion/core/src/physical\_optimizer/repartition.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9vcHRpbWl6ZXIvcmVwYXJ0aXRpb24ucnM=) | `100.00% <0.00%> (ø)` | |
   | [datafusion/optimizer/src/projection\_push\_down.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9vcHRpbWl6ZXIvc3JjL3Byb2plY3Rpb25fcHVzaF9kb3duLnJz) | `98.08% <0.00%> (+<0.01%)` | :arrow_up: |
   | [datafusion/expr/src/logical\_plan/builder.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vYnVpbGRlci5ycw==) | `89.33% <0.00%> (+0.01%)` | :arrow_up: |
   | ... and [15 more](https://codecov.io/gh/apache/arrow-datafusion/pull/2729/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2729?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2729?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ed0fe8c...0a386ce](https://codecov.io/gh/apache/arrow-datafusion/pull/2729?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2729: fix: filter push down with `InList` expressions

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r898628093


##########
datafusion/optimizer/src/utils.rs:
##########
@@ -276,9 +276,13 @@ pub fn rewrite_expression(expr: &Expr, expressions: &[Expr]) -> Result<Expr> {
         }
         Expr::Not(_) => Ok(Expr::Not(Box::new(expressions[0].clone()))),
         Expr::Negative(_) => Ok(Expr::Negative(Box::new(expressions[0].clone()))),
+        Expr::InList { list, negated, .. } => Ok(Expr::InList {
+            expr: Box::new(expressions[0].clone()),
+            list: list.clone(),
+            negated: *negated,
+        }),
         Expr::Column(_)
         | Expr::Literal(_)
-        | Expr::InList { .. }
         | Expr::Exists { .. }

Review Comment:
   Agree!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
andygrove commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1155259324

   I will review this today


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r897490955


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -1904,6 +1904,42 @@ mod tests {
         Ok(())
     }
 
+    #[test]
+    fn test_filter_with_alias_2() -> Result<()> {
+        // in table scan the true col name is 'test.a',
+        // but we rename it as 'b', and use col 'b' in filter
+        // we need rewrite filter col before push down.
+        let table_scan = test_table_scan()?;
+        let plan = LogicalPlanBuilder::from(table_scan)
+            .project(vec![col("a").alias("b"), col("c")])?
+            .project(vec![col("b"), col("c")])?
+            .filter(and(col("b").gt(lit(10i64)), col("c").gt(lit(10i64))))?
+            .build()?;
+
+        // filter on col b
+        assert_eq!(
+            format!("{:?}", plan),

Review Comment:
   add test for: having two projection between filter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed pull request #2729: fix: filter push down with `InList` expressions

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #2729: fix: filter push down with `InList` expressions
URL: https://github.com/apache/arrow-datafusion/pull/2729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #2729: fix: filter push down with `InList` expressions

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1157150678

   @alamb Thank you for your carefulness review and kind advice 👍


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r897613300


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -1831,4 +1831,218 @@ mod tests {
 
         Ok(())
     }
+
+    #[test]
+    fn test_filter_with_alias() -> Result<()> {
+        // in table scan the true col name is 'test.a',
+        // but we rename it as 'b', and use col 'b' in filter
+        // we need rewrite filter col before push down.
+        let table_scan = test_table_scan()?;
+        let plan = LogicalPlanBuilder::from(table_scan)
+            .project(vec![col("a").alias("b"), col("c")])?
+            .filter(and(col("b").gt(lit(10i64)), col("c").gt(lit(10i64))))?
+            .build()?;
+
+        // filter on col b
+        assert_eq!(
+            format!("{:?}", plan),
+            "\
+            Filter: #b > Int64(10) AND #test.c > Int64(10)\
+            \n  Projection: #test.a AS b, #test.c\
+            \n    TableScan: test projection=None\
+            "
+        );
+
+        // rewrite filter col b to test.a
+        let expected = "\
+            Projection: #test.a AS b, #test.c\
+            \n  Filter: #test.a > Int64(10) AND #test.c > Int64(10)\
+            \n    TableScan: test projection=None\
+            ";
+
+        assert_optimized_plan_eq(&plan, expected);
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_filter_with_alias_2() -> Result<()> {
+        // in table scan the true col name is 'test.a',
+        // but we rename it as 'b', and use col 'b' in filter
+        // we need rewrite filter col before push down.
+        let table_scan = test_table_scan()?;
+        let plan = LogicalPlanBuilder::from(table_scan)
+            .project(vec![col("a").alias("b"), col("c")])?
+            .project(vec![col("b"), col("c")])?
+            .filter(and(col("b").gt(lit(10i64)), col("c").gt(lit(10i64))))?
+            .build()?;
+
+        // filter on col b
+        assert_eq!(
+            format!("{:?}", plan),
+            "\
+            Filter: #b > Int64(10) AND #test.c > Int64(10)\
+            \n  Projection: #b, #test.c\
+            \n    Projection: #test.a AS b, #test.c\
+            \n      TableScan: test projection=None\
+            "
+        );
+
+        // rewrite filter col b to test.a
+        let expected = "\
+            Projection: #b, #test.c\
+            \n  Projection: #test.a AS b, #test.c\
+            \n    Filter: #test.a > Int64(10) AND #test.c > Int64(10)\
+            \n      TableScan: test projection=None\
+            ";
+
+        assert_optimized_plan_eq(&plan, expected);
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_filter_with_multi_alias() -> Result<()> {
+        let table_scan = test_table_scan()?;
+        let plan = LogicalPlanBuilder::from(table_scan)
+            .project(vec![col("a").alias("b"), col("c").alias("d")])?
+            .filter(and(col("b").gt(lit(10i64)), col("d").gt(lit(10i64))))?
+            .build()?;
+
+        // filter on col b and d
+        assert_eq!(
+            format!("{:?}", plan),
+            "\
+            Filter: #b > Int64(10) AND #d > Int64(10)\
+            \n  Projection: #test.a AS b, #test.c AS d\
+            \n    TableScan: test projection=None\
+            "
+        );
+
+        // rewrite filter col b to test.a, col d to test.c
+        let expected = "\
+            Projection: #test.a AS b, #test.c AS d\
+            \n  Filter: #test.a > Int64(10) AND #test.c > Int64(10)\
+            \n    TableScan: test projection=None\
+            ";
+
+        assert_optimized_plan_eq(&plan, expected);
+
+        Ok(())
+    }
+
+    /// predicate on join key in filter expression should be pushed down to both inputs
+    #[test]
+    fn join_filter_with_alias() -> Result<()> {
+        let table_scan = test_table_scan()?;
+        let left = LogicalPlanBuilder::from(table_scan)
+            .project(vec![col("a").alias("c")])?
+            .build()?;
+        let right_table_scan = test_table_scan_with_name("test2")?;
+        let right = LogicalPlanBuilder::from(right_table_scan)
+            .project(vec![col("b").alias("d")])?
+            .build()?;
+        let filter = col("c").gt(lit(1u32));
+        let plan = LogicalPlanBuilder::from(left)
+            .join(
+                &right,
+                JoinType::Inner,
+                (vec![Column::from_name("c")], vec![Column::from_name("d")]),
+                Some(filter),
+            )?
+            .build()?;
+
+        assert_eq!(
+            format!("{:?}", plan),
+            "\
+            Inner Join: #c = #d Filter: #c > UInt32(1)\
+            \n  Projection: #test.a AS c\
+            \n    TableScan: test projection=None\
+            \n  Projection: #test2.b AS d\
+            \n    TableScan: test2 projection=None"
+        );
+
+        // Change filter on col `c`, 'd' to `test.a`, 'test.b'
+        let expected = "\
+        Inner Join: #c = #d\
+        \n  Projection: #test.a AS c\
+        \n    Filter: #test.a > UInt32(1)\
+        \n      TableScan: test projection=None\
+        \n  Projection: #test2.b AS d\
+        \n    Filter: #test2.b > UInt32(1)\
+        \n      TableScan: test2 projection=None";
+        assert_optimized_plan_eq(&plan, expected);
+        Ok(())
+    }
+
+    #[test]
+    fn test_in_filter_with_alias() -> Result<()> {

Review Comment:
   This test will fail without code change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r896584736


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -336,6 +357,18 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> Result<LogicalPlan> {
         LogicalPlan::Analyze { .. } => push_down(&state, plan),
         LogicalPlan::Filter(Filter { input, predicate }) => {
             let mut predicates = vec![];
+            let mut alias_cols_expr_and_name = HashMap::new();
+            //Need rewrite column name before push down
+            let input_plan = &*input.clone();
+            if let LogicalPlan::Projection(projection) = input_plan {

Review Comment:
   I think i should get all `Expr::Alias`, not only one level🤔



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2729: fix: filter push down with `InList` expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1156983491

   Thanks again @Ted-Jiang  for pushing through and finding the root cause


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #2729: fix: filter push down with `InList` expressions

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#issuecomment-1156983256

   Filed https://github.com/apache/arrow-datafusion/issues/2736 for follow on task


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r897292665


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -1831,4 +1870,111 @@ mod tests {
 
         Ok(())
     }
+

Review Comment:
   When I remove the code changes in this PR the tests still pass 🤔 
   
   So when I apply this diff:
   
   ```diff
   diff --git a/datafusion/optimizer/src/filter_push_down.rs b/datafusion/optimizer/src/filter_push_down.rs
   index 1629e95c5..2668eaaf7 100644
   --- a/datafusion/optimizer/src/filter_push_down.rs
   +++ b/datafusion/optimizer/src/filter_push_down.rs
   @@ -122,27 +122,6 @@ fn remove_filters(
            .collect::<Vec<_>>()
    }
    
   -// rename all filter columns which have alias name
   -fn rename_filters_column_name(
   -    filters: &mut [Predicate],
   -    alias_cols_expr_and_name: &HashMap<&Expr, &String>,
   -) {
   -    for (expr, columns) in filters {
   -        if alias_cols_expr_and_name.contains_key(expr) {
   -            let col_string = <&std::string::String>::clone(
   -                alias_cols_expr_and_name.get(expr).unwrap(),
   -            );
   -            let column = Column::from_qualified_name(col_string);
   -            if let Expr::Column(col) = expr {
   -                columns.remove(col);
   -            } else {
   -                unreachable!()
   -            }
   -            columns.insert(column);
   -        }
   -    }
   -}
   -
    /// builds a new [LogicalPlan] from `plan` by issuing new [LogicalPlan::Filter] if any of the filters
    /// in `state` depend on the columns `used_columns`.
    fn issue_filters(
   @@ -357,18 +336,6 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> Result<LogicalPlan> {
            LogicalPlan::Analyze { .. } => push_down(&state, plan),
            LogicalPlan::Filter(Filter { input, predicate }) => {
                let mut predicates = vec![];
   -            let mut alias_cols_expr_and_name = HashMap::new();
   -            //Need rewrite column name before push down
   -            let input_plan = &*input.clone();
   -            if let LogicalPlan::Projection(projection) = input_plan {
   -                let exprs = &projection.expr;
   -                for e in exprs {
   -                    if let Expr::Alias(col_expr, alias_name) = e {
   -                        alias_cols_expr_and_name.insert(col_expr.as_ref(), alias_name);
   -                    }
   -                }
   -            }
   -
                utils::split_conjunction(predicate, &mut predicates);
    
                // Predicates without referencing columns (WHERE FALSE, WHERE 1=1, etc.)
   @@ -397,12 +364,6 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> Result<LogicalPlan> {
                        &no_col_predicates,
                    ))
                } else {
   -                if !alias_cols_expr_and_name.is_empty() {
   -                    rename_filters_column_name(
   -                        &mut state.filters,
   -                        &alias_cols_expr_and_name,
   -                    )
   -                }
                    optimize(input, state)
                }
            }
   ```
   
   The tests all pass:
   
   ```
   
   -*- mode: compilation; default-directory: "~/Software/arrow-datafusion/" -*-
   Compilation started at Tue Jun 14 16:27:52
   
   cd /Users/alamb/Software/arrow-datafusion && RUST_BACKTRACE=1 CARGO_TARGET_DIR=/Users/alamb/Software/df-target cargo test -p datafusion-optimizer -- filter_push_down
      Compiling datafusion-optimizer v9.0.0 (/Users/alamb/Software/arrow-datafusion/datafusion/optimizer)
       Finished test [unoptimized + debuginfo] target(s) in 4.38s
        Running unittests src/lib.rs (/Users/alamb/Software/df-target/debug/deps/datafusion_optimizer-ad133d04102e7238)
   
   running 39 tests
   test filter_push_down::tests::filter_no_columns ... ok
   test filter_push_down::tests::filter_before_projection ... ok
   test filter_push_down::tests::alias ... ok
   test filter_push_down::tests::filter_jump_2_plans ... ok
   test filter_push_down::tests::filter_after_limit ... ok
   test filter_push_down::tests::complex_expression ... ok
   test filter_push_down::tests::filter_keep_agg ... ok
   test filter_push_down::tests::filter_move_agg ... ok
   test filter_push_down::tests::complex_plan ... ok
   test filter_push_down::tests::filter_2_breaks_limits ... ok
   test filter_push_down::tests::double_limit ... ok
   test filter_push_down::tests::filter_using_left_join ... ok
   test filter_push_down::tests::filter_on_join_on_common_independent ... ok
   test filter_push_down::tests::filter_join_on_one_side ... ok
   test filter_push_down::tests::filter_join_on_common_dependent ... ok
   test filter_push_down::tests::filter_with_table_provider_exact ... ok
   test filter_push_down::tests::filter_using_join_on_common_independent ... ok
   test filter_push_down::tests::filter_with_table_provider_inexact ... ok
   test filter_push_down::tests::filter_using_left_join_on_common ... ok
   test filter_push_down::tests::filter_with_table_provider_unsupported ... ok
   test filter_push_down::tests::filters_user_defined_node ... ok
   test filter_push_down::tests::filter_with_table_provider_multiple_invocations ... ok
   test filter_push_down::tests::filter_using_right_join ... ok
   test filter_push_down::tests::filter_using_right_join_on_common ... ok
   test filter_push_down::tests::full_join_on_with_filter ... ok
   test filter_push_down::tests::multi_combined_filter ... ok
   test filter_push_down::tests::join_filter_on_common ... ok
   test filter_push_down::tests::join_filter_with_alias ... ok
   test filter_push_down::tests::join_filter_removed ... ok
   test filter_push_down::tests::left_join_on_with_filter ... ok
   test filter_push_down::tests::join_on_with_filter ... ok
   test filter_push_down::tests::test_filter_with_alias ... ok
   test filter_push_down::tests::test_filter_with_multi_alias ... ok
   test filter_push_down::tests::two_filters_on_same_depth ... ok
   test filter_push_down::tests::union_all ... ok
   test filter_push_down::tests::union_all_with_alias ... ok
   test filter_push_down::tests::multi_filter ... ok
   test filter_push_down::tests::split_filter ... ok
   test filter_push_down::tests::right_join_on_with_filter ... ok
   
   test result: ok. 39 passed; 0 failed; 0 ignored; 0 measured; 118 filtered out; finished in 0.07s
   
      Doc-tests datafusion_optimizer
   
   running 0 tests
   
   test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 2 filtered out; finished in 0.00s
   
   
   Compilation finished at Tue Jun 14 16:27:56
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2729: Filter push down need consider alias columns

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on code in PR #2729:
URL: https://github.com/apache/arrow-datafusion/pull/2729#discussion_r896618912


##########
datafusion/optimizer/src/filter_push_down.rs:
##########
@@ -336,6 +357,18 @@ fn optimize(plan: &LogicalPlan, mut state: State) -> Result<LogicalPlan> {
         LogicalPlan::Analyze { .. } => push_down(&state, plan),
         LogicalPlan::Filter(Filter { input, predicate }) => {
             let mut predicates = vec![];
+            let mut alias_cols_expr_and_name = HashMap::new();
+            //Need rewrite column name before push down
+            let input_plan = &*input.clone();
+            if let LogicalPlan::Projection(projection) = input_plan {

Review Comment:
   Is there one API to get all inputs'  all Expr::Alias



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org