You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "maxburke (via GitHub)" <gi...@apache.org> on 2023/07/09 20:57:20 UTC

[GitHub] [arrow-datafusion] maxburke opened a new issue, #6897: Fix for issue #6595 has broken existing working queries

maxburke opened a new issue, #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897

   ### Describe the bug
   
   After upgrading to Datafusion 27.0.0 we noticed some of our regression tests were failing. We bisected the commit that introduced the break to 36123ee0, which is the fix for #6595.
   
   ### To Reproduce
   
   The attached zip file contains a parquet file. To reproduce the issue in datafusion-cli, run:
   
   ```
   > create external table t0 stored as parquet location 'test_data.parquet';
   > SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM t0  GROUP BY "day" ORDER BY t0."day" ASC;
   ```
   
   In datafusion 26 and earlier, this will generate a result. In datafusion 27, it generates this error message:
   
   ```
   Optimizer rule 'push_down_projection' failed
   caused by
   Error during planning: required columns can't push down, columns: {Column { relation: Some(Bare { table: "t0" }), name: "day" }, Column { relation: None, name: "num_directions" }, Column { relation: None, name: "date" }}
   ```
   
   [test_data.zip](https://github.com/apache/arrow-datafusion/files/11996072/test_data.zip)
   
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6897: Fix for issue #6595 has broken existing working queries

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1629072081

   cc @jackwener 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6897: Fix for issue #6595 has broken existing working queries

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1629071485

   I have verified this has been fixed on master (aka what will be released in DataFusion `28.0.0`).
   
   BTW I  added new test coverage in https://github.com/apache/arrow-datafusion/pull/6836 so that we don't break this again by accident. 
   
   Since it is a regression I would be willing to create a patch release (`27.0.1`) with the fix if that would be helpful for others 
   
   Using this query (thanks for the reproducer @maxburke 🙏 )
   
   ```sql
   SELECT 
    "day"  AS  "date", count(distinct "direction")  AS  "num_directions" 
   FROM 'test_data.parquet' 
   GROUP BY "day" 
   ORDER BY "day" ASC;
   ```
   
   ## `26.0.0` works
   ```shell
   DataFusion CLI v26.0.0
   ❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
   +---------------------+----------------+
   | date                | num_directions |
   +---------------------+----------------+
   | 2011-09-09T00:00:00 | 2              |
   | 2011-09-10T00:00:00 | 2              |
   ...
   
   | 2018-04-14T00:00:00 | 2              |
   | 2018-04-15T00:00:00 | 2              |
   +---------------------+----------------+
   81 rows in set. Query took 0.024 seconds.
   ❯
   ```
   
   ## `27.0.0` fails
   ```shell
   DataFusion CLI v27.0.0
   ❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
   Optimizer rule 'simplify_expressions' failed
   caused by
   Schema error: No field named "test_data.parquet".day. Valid fields are "test_data.parquet.day", "COUNT(DISTINCT test_data.parquet.direction)".
   ❯
   ```
   
   ## `main` passes:
   ```shell
   $ git checkout main
   Already on 'main'
   Your branch is up to date with 'apache/main'.
   $ CARGO_TARGET_DIR=/Users/alamb/Software/target-df cargo run
       Finished dev [unoptimized + debuginfo] target(s) in 0.27s
        Running `/Users/alamb/Software/target-df/debug/datafusion-cli`
   DataFusion CLI v27.0.0
   ❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
   +---------------------+----------------+
   | date                | num_directions |
   +---------------------+----------------+
   | 2011-09-09T00:00:00 | 2              |
   | 2011-09-10T00:00:00 | 2              |
   ...
   | 2018-04-14T00:00:00 | 2              |
   | 2018-04-15T00:00:00 | 2              |
   +---------------------+----------------+
   81 rows in set. Query took 0.027 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #6897: Fix for issue #6595 has broken existing working queries

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #6897:
URL: https://github.com/apache/arrow-datafusion/issues/6897#issuecomment-1628891633

   I believe this is similar to https://github.com/apache/arrow-datafusion/issues/6790 which was since fixed. I will verify that this is the same thing (we saw a similar error in IOx)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #6897: Fix for issue #6595 has broken existing working queries ("Schema error: No field named ...")

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6897: Fix for issue #6595 has broken existing working queries ("Schema error: No field named ...")
URL: https://github.com/apache/arrow-datafusion/issues/6897


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org