You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "andygrove (via GitHub)" <gi...@apache.org> on 2023/04/10 21:56:38 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue, #5949: Regression in 22.0.0 with filter push-down

andygrove opened a new issue, #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949

   ### Describe the bug
   
   The following code is from a unit test in arrow-datafusion-python. It works correctly with DataFusion 21.1.0.
   
   ```python
       df = df.select(
           column("a") + column("b"),
           column("a") - column("b"),
       ).filter(column("a") > literal(2))
   
       # execute and collect the first (and only) batch
       result = df.collect()[0]
   ```
   
   When I upgrade to DF 22.0.0, it fails with:
   
   ```
   Exception: Schema error: No field named ca29d730badd94c3d96481c7edf6255b0.a. Valid fields are "ca29d730badd94c3d96481c7edf6255b0.a + ca29d730badd94c3d96481c7edf6255b0.b", "ca29d730badd94c3d96481c7edf6255b0.a - ca29d730badd94c3d96481c7edf6255b0.b".
   ```
   
   
   ### To Reproduce
   
   See https://github.com/apache/arrow-datafusion-python/pull/320
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #5949: Regression in 22.0.0 with filter push-down

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1503208583

   I agree, the program as shown in the description *should* fail


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove closed issue #5949: Regression in 22.0.0 with filter push-down

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove closed issue #5949: Regression in 22.0.0 with filter push-down
URL: https://github.com/apache/arrow-datafusion/issues/5949


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5949: Regression in 22.0.0 with filter push-down

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502880319

   hi, @jackwener
   about pr #5686  when i rebase with main branch.
   
   `cargo test --color=always --test dataframe test_count_wildcard_on_window` got error
   
   before rebase with main ,it's work fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #5949: Regression in 22.0.0 with filter push-down

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1503862812

   Thanks for looking into this. I updated the tests to perform the filter before the projection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #5949: Regression in 22.0.0 with filter push-down

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502967580

   It indeed is covered by `skip failed rule`
   
   ```
   | initial_logical_plan                                       | Filter: t.a > Int32(1)                                                      |
   |                                                            |   Projection: t.a - t.b                                                     |
   |                                                            |     TableScan: t                                                            |
   | logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                          |
   | logical_plan after replace_distinct_aggregate              | SAME TEXT AS ABOVE                                                          |
   | logical_plan after decorrelate_where_exists                | SAME TEXT AS ABOVE                                                          |
   | logical_plan after decorrelate_where_in                    | SAME TEXT AS ABOVE                                                          |
   | logical_plan after scalar_subquery_to_join                 | SAME TEXT AS ABOVE                                                          |
   | logical_plan after extract_equijoin_predicate              | SAME TEXT AS ABOVE                                                          |
   | logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                          |
   | logical_plan after merge_projection                        | SAME TEXT AS ABOVE                                                          |
   | logical_plan after rewrite_disjunctive_predicate           | SAME TEXT AS ABOVE                                                          |
   | logical_plan after eliminate_duplicated_expr               | SAME TEXT AS ABOVE                                                          |
   | logical_plan after eliminate_filter                        | SAME TEXT AS ABOVE                                                          |
   | logical_plan after eliminate_cross_join                    | SAME TEXT AS ABOVE                                                          |
   | logical_plan after eliminate_limit                         | SAME TEXT AS ABOVE                                                          |
   | logical_plan after propagate_empty_relation                | SAME TEXT AS ABOVE                                                          |
   | logical_plan after filter_null_join_keys                   | SAME TEXT AS ABOVE                                                          |
   | logical_plan after eliminate_outer_join                    | SAME TEXT AS ABOVE                                                          |
   | logical_plan after push_down_limit                         | SAME TEXT AS ABOVE                                                          |
   | logical_plan after push_down_filter                        | Projection: t.a - t.b                                                       |
   |                                                            |   Filter: t.a > Int32(1)                                                    |
   |                                                            |     TableScan: t      
   ```
   
   After I investigate it, I find the cause of this problem.
   
   This plan is wrong, but because of we don't have analyzer, so it was cover by `push_down_filter`.
   
   After `push_down_filter`, the wrong plan will be right plan.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #5949: Regression in 22.0.0 with filter push-down

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502981688

   So, I think it isn't a regression.
   
   **But**, If we want to support it, I think we can add a analyzer rule to add missing column. I'm not sure if we need to do this.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on issue #5949: Regression in 22.0.0 with filter push-down

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502636600

   It seems that this plan is unreasonable.
   ```
   select(
           column("a") + column("b"),
           column("a") - column("b"),
       )
   will get two col (a + b) (a - b)
   
   filter can't find col(a)
   ```
   
   why this PR would have introduced this change in behavior?
   
   In my opinion, origin `type coercion` is in `optimizer` and `optimizer` can skip failed rule. So the problem is covered up.
   After move it into `Analyzer` which don't skip failed rule, so this problem expose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #5949: Regression in 22.0.0 with filter push-down

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502412542

   The issue seems to have been introduced in https://github.com/apache/arrow-datafusion/pull/5831. @jackwener do you know why this PR would have introduced this change in behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org