You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/27 13:39:47 UTC
[GitHub] [arrow-datafusion] dev870 commented on issue #1484: Is it possible to query multiple parquet files ?
dev870 commented on issue #1484:
URL: https://github.com/apache/arrow-datafusion/issues/1484#issuecomment-1001575085
Thanks a lot @Igosuki . Based on your example, I tried below - but I am getting row count from only the first parquet file inside `/path/to/datafusion-parquet/data/`, I was expecting to get results from all the files in the path.
I am struggling to understand what am I missing here...
```rust
/// This example demonstrates executing a simple query against an Arrow data source (Parquet) and
/// fetching results
#[tokio::main]
async fn main() -> Result<()> {
// create local execution context
let mut ctx = ExecutionContext::new();
let file_format = ParquetFormat::default().with_enable_pruning(true);
let listing_options = ListingOptions {
file_extension: ".parquet".to_owned(),
format: Arc::new(file_format),
table_partition_cols: vec![],
collect_stat: true,
target_partitions: 1,
};
ctx.register_listing_table(
"my_table",
&format!("file://{}", "/path/to/datafusion-parquet/data/"),
listing_options,
None,
).await.unwrap();
// execute the query
let df = ctx.sql("SELECT * FROM my_table").await?;
// print the results
let batches = df.collect().await?;
print!("{}", batches[0].num_rows());
Ok(())
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org