You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/11 21:49:06 UTC

[GitHub] [arrow-datafusion] mvanschellebeeck opened a new pull request, #4180: Provide a builder for ListingOptions

mvanschellebeeck opened a new pull request, #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180

   # Which issue does this PR close?
   
   Closes #4178
   
   # What changes are included in this PR?
   
   Improves the ergonomics of [Listing Options](https://github.com/apache/arrow-datafusion/blob/a9add0e267f387559e155e9a477eee2cc99d0f2b/datafusion/core/src/datasource/listing/table.rs#L202) by provider a builder.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed pull request #4180: Provide a builder for ListingOptions

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #4180: Provide a builder for ListingOptions
URL: https://github.com/apache/arrow-datafusion/pull/4180


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4180: Provide a builder for ListingOptions

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#discussion_r1020756782


##########
datafusion-examples/examples/parquet_sql_multiple_files.rs:
##########
@@ -33,13 +33,11 @@ async fn main() -> Result<()> {
 
     // Configure listing options
     let file_format = ParquetFormat::default().with_enable_pruning(true);
-    let listing_options = ListingOptions {
-        file_extension: FileType::PARQUET.get_ext(),
-        format: Arc::new(file_format),
-        table_partition_cols: vec![],
-        collect_stat: true,
-        target_partitions: 1,
-    };
+    let listing_options = ListingOptions::new(Arc::new(file_format))
+        .with_file_extension(FileType::PARQUET.get_ext())
+        .with_table_partition_cols(vec![])
+        .with_collect_stat(true)
+        .with_target_partitions(1);

Review Comment:
   We could further  simplify this code and avoid explicitly specifying the default values (last three lines)
   
   But that can also be done as a follow on PR



##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -238,6 +238,73 @@ impl ListingOptions {
             target_partitions: 1,
         }
     }
+    /// Set file extension on [`ListingOptions`] and returns self.
+    ///
+    /// ```
+    /// use std::sync::Arc;
+    /// use datafusion::datasource::{listing::ListingOptions, file_format::parquet::ParquetFormat};
+    ///
+    /// let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()))
+    ///     .with_file_extension(".parquet");
+    ///
+    /// assert_eq!(listing_options.file_extension, ".parquet");
+    /// ```
+    pub fn with_file_extension(mut self, file_extension: impl Into<String>) -> Self {
+        self.file_extension = file_extension.into();
+        self
+    }
+
+    /// Set table partition column names on [`ListingOptions`] and returns self.
+    ///
+    /// ```
+    /// use std::sync::Arc;
+    /// use datafusion::datasource::{listing::ListingOptions, file_format::parquet::ParquetFormat};
+    ///
+    /// let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()))
+    ///     .with_table_partition_cols(vec!["col_a".to_string(), "col_b".to_string()]);
+    ///
+    /// assert_eq!(listing_options.table_partition_cols, vec!["col_a", "col_b"]);
+    /// ```
+    pub fn with_table_partition_cols(
+        mut self,
+        table_partition_cols: Vec<String>,
+    ) -> Self {
+        self.table_partition_cols = table_partition_cols;
+        self
+    }
+
+    /// Set stat collection on [`ListingOptions`] and returns self.
+    ///
+    /// ```
+    /// use std::sync::Arc;
+    /// use datafusion::datasource::{listing::ListingOptions, file_format::parquet::ParquetFormat};
+    ///
+    /// // Enable stat collection
+    /// let listing_options = ListingOptions::new(Arc::new(ParquetFormat::default()))
+    ///     .with_collect_stat(true);
+    ///
+    /// assert_eq!(listing_options.collect_stat, true);
+    /// ```
+    pub fn with_collect_stat(mut self, collect_stat: bool) -> Self {
+        self.collect_stat = collect_stat;
+        self
+    }
+
+    /// Set number of target partitions on [`ListingOptions`] and returns self.

Review Comment:
   ❤️ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4180: Provide a builder for ListingOptions

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#issuecomment-1314457561

   Thanks again @mvanschellebeeck  -- I took the liberty of merging up from master to resolve the conflicts (that I caused with https://github.com/apache/arrow-datafusion/commit/d2814c960168b45c4a0f5d7bbb72d9f412cb08bd / https://github.com/apache/arrow-datafusion/pull/4170)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4180: Provide a builder for ListingOptions

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#issuecomment-1314460704

   I couldn't push directly to this PR 
   
   ```
   $ git  push
   Enumerating objects: 63, done.
   Counting objects: 100% (63/63), done.
   Delta compression using up to 16 threads
   Compressing objects: 100% (27/27), done.
   Writing objects: 100% (27/27), 3.26 KiB | 41.00 KiB/s, done.
   Total 27 (delta 22), reused 0 (delta 0), pack-reused 0
   remote: Resolving deltas: 100% (22/22), completed with 16 local objects.        
   To github.com:mvanschellebeeck/arrow-datafusion.git
    ! [remote rejected]     mvanschellebeeck_master -> mvanschellebeeck_master (permission denied)
   error: failed to push some refs to 'github.com:mvanschellebeeck/arrow-datafusion.git'
   
   ```
   
   But I put my proposed changes in 
   https://github.com/apache/arrow-datafusion/pull/4207


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #4180: Provide a builder for ListingOptions

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #4180:
URL: https://github.com/apache/arrow-datafusion/pull/4180#issuecomment-1315600392

   Thanks again @mvanschellebeeck  -- this is so much nicer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org