You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "osawyerr (via GitHub)" <gi...@apache.org> on 2023/04/22 12:07:58 UTC

[GitHub] [arrow-datafusion] osawyerr opened a new issue, #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet

osawyerr opened a new issue, #6094:
URL: https://github.com/apache/arrow-datafusion/issues/6094

   ### Describe the bug
   
   When using a simple glob pattern like ``*.parquet`` ListingTableUrl works and the correct files are found and queried, however when using a more complex glob pattern like ``*.{1,2,3}.parquet`` files aren't not found
   
   File names are in the format ``filename.1.parquet, filename.2.parquet, filename.3.parquet``.
   
   ### To Reproduce
   
   Sample code below:
   
   ```rust
   let schema = ...;
   
   // Configure listing options
   let file_format = ParquetFormat::default().with_enable_pruning(Some(true));
   let mut listing_options =
       ListingOptions::new(Arc::new(file_format)).with_file_extension(".parquet");
   
   //this glob pattern doesn't work but *.parquet works
   let glob = "*.{1,2,3}.parquet";
   let table_path = &format!("/Users/olo/Documents/arrow_scratch/{glob}");
   
   let listing_table_url = ListingTableUrl::parse(table_path).unwrap();
   let config = ListingTableConfig::new(listing_table_url)
       .with_listing_options(listing_options)
       .with_schema(schema.clone());
   let table_provider = Arc::new(ListingTable::try_new(config).unwrap());
   
   let mut ctx = SessionContext::new();
   ctx.register_table(
       TableReference::Bare {
           table: Cow::Borrowed("some_table"),
       },
       table_provider,
   )
   .unwrap();
   
   let df = ctx.sql("select count(*) from some_table").await.unwrap();
   
   //returns nothing
   let records = df.collect().await.unwrap();
   ```
   
   ### Expected behavior
   
   Should find files and return the correct count.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] osawyerr commented on issue #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet

Posted by "osawyerr (via GitHub)" <gi...@apache.org>.
osawyerr commented on issue #6094:
URL: https://github.com/apache/arrow-datafusion/issues/6094#issuecomment-1518698538

   It appears that the correct pattern to use is ``*.[1,2,3].parquet``


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] osawyerr closed issue #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet

Posted by "osawyerr (via GitHub)" <gi...@apache.org>.
osawyerr closed issue #6094: ListingTableUrl doesn't work with some glob patterns e.g *.{1,2,3}.parquet
URL: https://github.com/apache/arrow-datafusion/issues/6094


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org