You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/11 21:49:06 UTC
[GitHub] [arrow-datafusion] mvanschellebeeck opened a new pull request, #4180: Provide a builder for ListingOptions
mvanschellebeeck opened a new pull request, #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180
# Which issue does this PR close?
Closes #4178
# What changes are included in this PR?
Improves the ergonomics of [Listing Options](https://github.com/apache/arrow-datafusion/blob/a9add0e267f387559e155e9a477eee2cc99d0f2b/datafusion/core/src/datasource/listing/table.rs#L202) by provider a builder.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed pull request #4180: Provide a builder for ListingOptions
Posted by GitBox <gi...@apache.org>.
alamb closed pull request #4180: Provide a builder for ListingOptions
URL: https://github.com/apache/arrow-datafusion/pull/4180
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4180: Provide a builder for ListingOptions
Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#discussion_r1020756782
##########
datafusion-examples/examples/parquet_sql_multiple_files.rs:
##########
@@ -33,13 +33,11 @@ async fn main() -> Result<()> {
// Configure listing options
let file_format = ParquetFormat::default().with_enable_pruning(true);
- let listing_options = ListingOptions {
- file_extension: FileType::PARQUET.get_ext(),
- format: Arc::new(file_format),
- table_partition_cols: vec![],
- collect_stat: true,
- target_partitions: 1,
- };
+ let listing_options = ListingOptions::new(Arc::new(file_format))
+ .with_file_extension(FileType::PARQUET.get_ext())
+ .with_table_partition_cols(vec![])
+ .with_collect_stat(true)
+ .with_target_partitions(1);
Review Comment:
We could further simplify this code and avoid explicitly specifying the default values (last three lines)
But that can also be done as a follow on PR
##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -238,6 +238,73 @@ impl ListingOptions {
target_partitions: 1,
}
}
+ /// Set file extension on [`ListingOptions`] and returns self.
+ ///
+ /// ```
+ /// use std::sync::Arc;
+ /// use datafusion::datasource::{listing::ListingOptions, file_format::parquet::ParquetFormat};
+ ///
+ /// let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()))
+ /// .with_file_extension(".parquet");
+ ///
+ /// assert_eq!(listing_options.file_extension, ".parquet");
+ /// ```
+ pub fn with_file_extension(mut self, file_extension: impl Into<String>) -> Self {
+ self.file_extension = file_extension.into();
+ self
+ }
+
+ /// Set table partition column names on [`ListingOptions`] and returns self.
+ ///
+ /// ```
+ /// use std::sync::Arc;
+ /// use datafusion::datasource::{listing::ListingOptions, file_format::parquet::ParquetFormat};
+ ///
+ /// let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()))
+ /// .with_table_partition_cols(vec!["col_a".to_string(), "col_b".to_string()]);
+ ///
+ /// assert_eq!(listing_options.table_partition_cols, vec!["col_a", "col_b"]);
+ /// ```
+ pub fn with_table_partition_cols(
+ mut self,
+ table_partition_cols: Vec<String>,
+ ) -> Self {
+ self.table_partition_cols = table_partition_cols;
+ self
+ }
+
+ /// Set stat collection on [`ListingOptions`] and returns self.
+ ///
+ /// ```
+ /// use std::sync::Arc;
+ /// use datafusion::datasource::{listing::ListingOptions, file_format::parquet::ParquetFormat};
+ ///
+ /// // Enable stat collection
+ /// let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()))
+ /// .with_collect_stat(true);
+ ///
+ /// assert_eq!(listing_options.collect_stat, true);
+ /// ```
+ pub fn with_collect_stat(mut self, collect_stat: bool) -> Self {
+ self.collect_stat = collect_stat;
+ self
+ }
+
+ /// Set number of target partitions on [`ListingOptions`] and returns self.
Review Comment:
❤️
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4180: Provide a builder for ListingOptions
Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#issuecomment-1314457561
Thanks again @mvanschellebeeck -- I took the liberty of merging up from master to resolve the conflicts (that I caused with https://github.com/apache/arrow-datafusion/commit/d2814c960168b45c4a0f5d7bbb72d9f412cb08bd / https://github.com/apache/arrow-datafusion/pull/4170)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4180: Provide a builder for ListingOptions
Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#issuecomment-1314460704
I couldn't push directly to this PR
```
$ git push
Enumerating objects: 63, done.
Counting objects: 100% (63/63), done.
Delta compression using up to 16 threads
Compressing objects: 100% (27/27), done.
Writing objects: 100% (27/27), 3.26 KiB | 41.00 KiB/s, done.
Total 27 (delta 22), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (22/22), completed with 16 local objects.
To github.com:mvanschellebeeck/arrow-datafusion.git
! [remote rejected] mvanschellebeeck_master -> mvanschellebeeck_master (permission denied)
error: failed to push some refs to 'github.com:mvanschellebeeck/arrow-datafusion.git'
```
But I put my proposed changes in
https://github.com/apache/arrow-datafusion/pull/4207
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on pull request #4180: Provide a builder for ListingOptions
Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#issuecomment-1315600392
Thanks again @mvanschellebeeck -- this is so much nicer
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org