You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/12 22:22:58 UTC

[GitHub] [arrow-datafusion] pjmore opened a new issue #1999: Error occurs when only using partition columns in query

pjmore opened a new issue #1999:
URL: https://github.com/apache/arrow-datafusion/issues/1999


   **Describe the bug**
   When using a partitioned datasource selecting only the partitioned columns causes errors. 
   
   **To Reproduce**
   Steps to reproduce the behavior:
   Create sample parquet/csv file content doesn't matter. Copy file to two separate paths using path partitioning scheme. E.g.
   "year=2021/month=09/day=09/file.parquet",
   "year=2021/month=10/day=09/file.parquet",
   
   Register listing table and execute query 
   ```select distinct year,month, day from t```
   
   **Expected behavior**
   Should return record batch with: 
   ```
   +------+-------+-----+
   | year | month | day||
   +------+-------+-----+
   | 2021 | 09    | 09  |
   | 2021 | 10    | 09  |
   | 2021 | 10    | 28  |
   +------+-------+-----+ 
   ```
   
   **Additional context**
   This was verified to be a problem with parquet and csv formats. Unsure about avro or json.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org