You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "osawyerr (via GitHub)" <gi...@apache.org> on 2023/04/22 12:07:58 UTC
[GitHub] [arrow-datafusion] osawyerr opened a new issue, #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet
osawyerr opened a new issue, #6094:
URL: https://github.com/apache/arrow-datafusion/issues/6094
### Describe the bug
When using a simple glob pattern like ``*.parquet`` ListingTableUrl works and the correct files are found and queried, however when using a more complex glob pattern like ``*.{1,2,3}.parquet`` files aren't not found
File names are in the format ``filename.1.parquet, filename.2.parquet, filename.3.parquet``.
### To Reproduce
Sample code below:
```rust
let schema = ...;
// Configure listing options
let file_format = ParquetFormat::default().with_enable_pruning(Some(true));
let mut listing_options =
ListingOptions::new(Arc::new(file_format)).with_file_extension(".parquet");
//this glob pattern doesn't work but *.parquet works
let glob = "*.{1,2,3}.parquet";
let table_path = &format!("/Users/olo/Documents/arrow_scratch/{glob}");
let listing_table_url = ListingTableUrl::parse(table_path).unwrap();
let config = ListingTableConfig::new(listing_table_url)
.with_listing_options(listing_options)
.with_schema(schema.clone());
let table_provider = Arc::new(ListingTable::try_new(config).unwrap());
let mut ctx = SessionContext::new();
ctx.register_table(
TableReference::Bare {
table: Cow::Borrowed("some_table"),
},
table_provider,
)
.unwrap();
let df = ctx.sql("select count(*) from some_table").await.unwrap();
//returns nothing
let records = df.collect().await.unwrap();
```
### Expected behavior
Should find files and return the correct count.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] osawyerr commented on issue #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet
Posted by "osawyerr (via GitHub)" <gi...@apache.org>.
osawyerr commented on issue #6094:
URL: https://github.com/apache/arrow-datafusion/issues/6094#issuecomment-1518698538
It appears that the correct pattern to use is ``*.[1,2,3].parquet``
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] osawyerr closed issue #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet
Posted by "osawyerr (via GitHub)" <gi...@apache.org>.
osawyerr closed issue #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet
URL: https://github.com/apache/arrow-datafusion/issues/6094
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org