You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "maxburke (via GitHub)" <gi...@apache.org> on 2023/07/09 20:57:20 UTC
[GitHub] [arrow-datafusion] maxburke opened a new issue, #6897: Fix for issue #6595 has broken existing working queries
maxburke opened a new issue, #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897
### Describe the bug
After upgrading to Datafusion 27.0.0 we noticed some of our regression tests were failing. We bisected the commit that introduced the break to 36123ee0, which is the fix for #6595.
### To Reproduce
The attached zip file contains a parquet file. To reproduce the issue in datafusion-cli, run:
```
> create external table t0 stored as parquet location 'test_data.parquet';
> SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM t0 GROUP BY "day" ORDER BY t0."day" ASC;
```
In datafusion 26 and earlier, this will generate a result. In datafusion 27, it generates this error message:
```
Optimizer rule 'push_down_projection' failed
caused by
Error during planning: required columns can't push down, columns: {Column { relation: Some(Bare { table: "t0" }), name: "day" }, Column { relation: None, name: "num_directions" }, Column { relation: None, name: "date" }}
```
[test_data.zip](https://github.com/apache/arrow-datafusion/files/11996072/test_data.zip)
### Expected behavior
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #6897: Fix for issue #6595 has broken existing working queries
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1629072081
cc @jackwener
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #6897: Fix for issue #6595 has broken existing working queries
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1629071485
I have verified this has been fixed on master (aka what will be released in DataFusion `28.0.0`).
BTW I added new test coverage in https://github.com/apache/arrow-datafusion/pull/6836 so that we don't break this again by accident.
Since it is a regression I would be willing to create a patch release (`27.0.1`) with the fix if that would be helpful for others
Using this query (thanks for the reproducer @maxburke 🙏 )
```sql
SELECT
"day" AS "date", count(distinct "direction") AS "num_directions"
FROM 'test_data.parquet'
GROUP BY "day"
ORDER BY "day" ASC;
```
## `26.0.0` works
```shell
DataFusion CLI v26.0.0
❯ SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM 'test_data.parquet' GROUP BY "day" ORDER BY "day" ASC;
+---------------------+----------------+
| date | num_directions |
+---------------------+----------------+
| 2011-09-09T00:00:00 | 2 |
| 2011-09-10T00:00:00 | 2 |
...
| 2018-04-14T00:00:00 | 2 |
| 2018-04-15T00:00:00 | 2 |
+---------------------+----------------+
81 rows in set. Query took 0.024 seconds.
❯
```
## `27.0.0` fails
```shell
DataFusion CLI v27.0.0
❯ SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM 'test_data.parquet' GROUP BY "day" ORDER BY "day" ASC;
Optimizer rule 'simplify_expressions' failed
caused by
Schema error: No field named "test_data.parquet".day. Valid fields are "test_data.parquet.day", "COUNT(DISTINCT test_data.parquet.direction)".
❯
```
## `main` passes:
```shell
$ git checkout main
Already on 'main'
Your branch is up to date with 'apache/main'.
$ CARGO_TARGET_DIR=/Users/alamb/Software/target-df cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.27s
Running `/Users/alamb/Software/target-df/debug/datafusion-cli`
DataFusion CLI v27.0.0
❯ SELECT "day" AS "date", count(distinct "direction") AS "num_directions" FROM 'test_data.parquet' GROUP BY "day" ORDER BY "day" ASC;
+---------------------+----------------+
| date | num_directions |
+---------------------+----------------+
| 2011-09-09T00:00:00 | 2 |
| 2011-09-10T00:00:00 | 2 |
...
| 2018-04-14T00:00:00 | 2 |
| 2018-04-15T00:00:00 | 2 |
+---------------------+----------------+
81 rows in set. Query took 0.027 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #6897: Fix for issue #6595 has broken existing working queries
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1628891633
I believe this is similar to https://github.com/apache/arrow-datafusion/issues/6790 which was since fixed. I will verify that this is the same thing (we saw a similar error in IOx)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #6897: Fix for issue #6595 has broken existing working queries ("Schema error: No field named ...")
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6897: Fix for issue #6595 has broken existing working queries ("Schema error: No field named ...")
URL: https://github.com/apache/arrow-datafusion/issues/6897
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org