You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 02:22:54 UTC

[GitHub] [arrow-datafusion] jychen7 opened a new issue #2061: CreateExternalTable DDL supports table_partition_cols

jychen7 opened a new issue #2061:
URL: https://github.com/apache/arrow-datafusion/issues/2061


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   Assume we have a data lake stores as
   ```
   table/year=2022/month=03/day=20/log.parquet
   table/year=2022/month=03/day=21/log.parquet
   ```
   
   Currently, CreateExternalTable supports defining columns and location (e.g. `table/`)
   
   https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/sql/parser.rs#L70-L81
   
   a sql query of `select * from table where year = '2022' and month = '03' and day = '20'` seems to scan all files under `table/`.
   
   **Describe the solution you'd like**
   ```
   CREATE EXTERNAL TABLE test (
       c1  VARCHAR NOT NULL,
   )
   STORED AS CSV
   WITH HEADER ROW
   PARTITIONED BY (p1, p2)
   LOCATION '/path/to/';
   ```
   same as existing ListingOption, `PARTITIONED BY` only supports String
   https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/datasource/listing/table.rs#L178
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   `partitioned by` is also used in Trino and AWS Athena
   https://trino.io/episodes/5.html
   https://docs.aws.amazon.com/athena/latest/ug/create-table.html
   
   I notice that `ListingOptions` supports `table_partition_cols` and also `partition pruning`, but just `CreateExternalTable` does not accept such input and pass through 
   https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/datasource/listing/table.rs#L165-L186
   https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/datasource/listing/table.rs#L358-L365
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #2061: CreateExternalTable DDL supports table_partition_cols

Posted by GitBox <gi...@apache.org>.
alamb closed issue #2061:
URL: https://github.com/apache/arrow-datafusion/issues/2061


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #2061: CreateExternalTable DDL supports table_partition_cols

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2061:
URL: https://github.com/apache/arrow-datafusion/issues/2061#issuecomment-1076749355


   sounds like a good enhancement to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org