You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 02:22:54 UTC
[GitHub] [arrow-datafusion] jychen7 opened a new issue #2061: CreateExternalTable DDL supports table_partition_cols
jychen7 opened a new issue #2061:
URL: https://github.com/apache/arrow-datafusion/issues/2061
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
Assume we have a data lake stores as
```
table/year=2022/month=03/day=20/log.parquet
table/year=2022/month=03/day=21/log.parquet
```
Currently, CreateExternalTable supports defining columns and location (e.g. `table/`)
https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/sql/parser.rs#L70-L81
a sql query of `select * from table where year = '2022' and month = '03' and day = '20'` seems to scan all files under `table/`.
**Describe the solution you'd like**
```
CREATE EXTERNAL TABLE test (
c1 VARCHAR NOT NULL,
)
STORED AS CSV
WITH HEADER ROW
PARTITIONED BY (p1, p2)
LOCATION '/path/to/';
```
same as existing ListingOption, `PARTITIONED BY` only supports String
https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/datasource/listing/table.rs#L178
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
`partitioned by` is also used in Trino and AWS Athena
https://trino.io/episodes/5.html
https://docs.aws.amazon.com/athena/latest/ug/create-table.html
I notice that `ListingOptions` supports `table_partition_cols` and also `partition pruning`, but just `CreateExternalTable` does not accept such input and pass through
https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/datasource/listing/table.rs#L165-L186
https://github.com/apache/arrow-datafusion/blob/5936edc2a94d5fb20702a41eab2b80695961b9dc/datafusion/src/datasource/listing/table.rs#L358-L365
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #2061: CreateExternalTable DDL supports table_partition_cols
Posted by GitBox <gi...@apache.org>.
alamb closed issue #2061:
URL: https://github.com/apache/arrow-datafusion/issues/2061
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2061: CreateExternalTable DDL supports table_partition_cols
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2061:
URL: https://github.com/apache/arrow-datafusion/issues/2061#issuecomment-1076749355
sounds like a good enhancement to me
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org