You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "andygrove (via GitHub)" <gi...@apache.org> on 2023/04/10 21:56:38 UTC
[GitHub] [arrow-datafusion] andygrove opened a new issue, #5949: Regression in 22.0.0 with filter push-down
andygrove opened a new issue, #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949
### Describe the bug
The following code is from a unit test in arrow-datafusion-python. It works correctly with DataFusion 21.1.0.
```python
df = df.select(
column("a") + column("b"),
column("a") - column("b"),
).filter(column("a") > literal(2))
# execute and collect the first (and only) batch
result = df.collect()[0]
```
When I upgrade to DF 22.0.0, it fails with:
```
Exception: Schema error: No field named ca29d730badd94c3d96481c7edf6255b0.a. Valid fields are "ca29d730badd94c3d96481c7edf6255b0.a + ca29d730badd94c3d96481c7edf6255b0.b", "ca29d730badd94c3d96481c7edf6255b0.a - ca29d730badd94c3d96481c7edf6255b0.b".
```
### To Reproduce
See https://github.com/apache/arrow-datafusion-python/pull/320
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Dandandan commented on issue #5949: Regression in 22.0.0 with filter push-down
Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1503208583
I agree, the program as shown in the description *should* fail
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] andygrove closed issue #5949: Regression in 22.0.0 with filter push-down
Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove closed issue #5949: Regression in 22.0.0 with filter push-down
URL: https://github.com/apache/arrow-datafusion/issues/5949
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jiangzhx commented on issue #5949: Regression in 22.0.0 with filter push-down
Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502880319
hi, @jackwener
about pr #5686 when i rebase with main branch.
`cargo test --color=always --test dataframe test_count_wildcard_on_window` got error
before rebase with main ,it's work fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] andygrove commented on issue #5949: Regression in 22.0.0 with filter push-down
Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1503862812
Thanks for looking into this. I updated the tests to perform the filter before the projection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jackwener commented on issue #5949: Regression in 22.0.0 with filter push-down
Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502967580
It indeed is covered by `skip failed rule`
```
| initial_logical_plan | Filter: t.a > Int32(1) |
| | Projection: t.a - t.b |
| | TableScan: t |
| logical_plan after simplify_expressions | SAME TEXT AS ABOVE |
| logical_plan after replace_distinct_aggregate | SAME TEXT AS ABOVE |
| logical_plan after decorrelate_where_exists | SAME TEXT AS ABOVE |
| logical_plan after decorrelate_where_in | SAME TEXT AS ABOVE |
| logical_plan after scalar_subquery_to_join | SAME TEXT AS ABOVE |
| logical_plan after extract_equijoin_predicate | SAME TEXT AS ABOVE |
| logical_plan after simplify_expressions | SAME TEXT AS ABOVE |
| logical_plan after merge_projection | SAME TEXT AS ABOVE |
| logical_plan after rewrite_disjunctive_predicate | SAME TEXT AS ABOVE |
| logical_plan after eliminate_duplicated_expr | SAME TEXT AS ABOVE |
| logical_plan after eliminate_filter | SAME TEXT AS ABOVE |
| logical_plan after eliminate_cross_join | SAME TEXT AS ABOVE |
| logical_plan after eliminate_limit | SAME TEXT AS ABOVE |
| logical_plan after propagate_empty_relation | SAME TEXT AS ABOVE |
| logical_plan after filter_null_join_keys | SAME TEXT AS ABOVE |
| logical_plan after eliminate_outer_join | SAME TEXT AS ABOVE |
| logical_plan after push_down_limit | SAME TEXT AS ABOVE |
| logical_plan after push_down_filter | Projection: t.a - t.b |
| | Filter: t.a > Int32(1) |
| | TableScan: t
```
After I investigate it, I find the cause of this problem.
This plan is wrong, but because of we don't have analyzer, so it was cover by `push_down_filter`.
After `push_down_filter`, the wrong plan will be right plan.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jackwener commented on issue #5949: Regression in 22.0.0 with filter push-down
Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502981688
So, I think it isn't a regression.
**But**, If we want to support it, I think we can add a analyzer rule to add missing column. I'm not sure if we need to do this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] jackwener commented on issue #5949: Regression in 22.0.0 with filter push-down
Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502636600
It seems that this plan is unreasonable.
```
select(
column("a") + column("b"),
column("a") - column("b"),
)
will get two col (a + b) (a - b)
filter can't find col(a)
```
why this PR would have introduced this change in behavior?
In my opinion, origin `type coercion` is in `optimizer` and `optimizer` can skip failed rule. So the problem is covered up.
After move it into `Analyzer` which don't skip failed rule, so this problem expose.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] andygrove commented on issue #5949: Regression in 22.0.0 with filter push-down
Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on issue #5949:
URL: https://github.com/apache/arrow-datafusion/issues/5949#issuecomment-1502412542
The issue seems to have been introduced in https://github.com/apache/arrow-datafusion/pull/5831. @jackwener do you know why this PR would have introduced this change in behavior?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org