You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/20 16:51:17 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3885: Consolidate remaining parquet config options into ConfigOptions

alamb commented on code in PR #3885:
URL: https://github.com/apache/arrow-datafusion/pull/3885#discussion_r1000878323


##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -52,59 +57,70 @@ pub const DEFAULT_PARQUET_EXTENSION: &str = ".parquet";
 /// The Apache Parquet `FileFormat` implementation
 #[derive(Debug)]
 pub struct ParquetFormat {
-    enable_pruning: bool,

Review Comment:
   Here was one copy of (some) of the settings



##########
datafusion/core/src/execution/context.rs:
##########
@@ -1142,8 +1146,6 @@ pub struct SessionConfig {
     /// Should DataFusion repartition data using the partition keys to execute window functions in
     /// parallel using the provided `target_partitions` level
     pub repartition_windows: bool,
-    /// Should DataFusion parquet reader using the predicate to prune data
-    pub parquet_pruning: bool,

Review Comment:
   here is a second copy of one of the settings



##########
datafusion/core/src/execution/options.rs:
##########
@@ -168,56 +170,31 @@ pub struct ParquetReadOptions<'a> {
     pub file_extension: &'a str,
     /// Partition Columns
     pub table_partition_cols: Vec<String>,
-    /// Should DataFusion parquet reader use the predicate to prune data,

Review Comment:
   here is a third copy, again of some of the settings



##########
datafusion/core/tests/sql/information_schema.rs:
##########
@@ -702,8 +702,11 @@ async fn show_all() {
         "| datafusion.execution.coalesce_batches           | true    |",
         "| datafusion.execution.coalesce_target_batch_size | 4096    |",
         "| datafusion.execution.parquet.enable_page_index  | false   |",
+        "| datafusion.execution.parquet.metadata_size_hint | NULL    |",

Review Comment:
   defaults are now even clearer!



##########
datafusion/core/tests/sql/parquet_schema.rs:
##########
@@ -141,6 +139,13 @@ async fn schema_merge_can_preserve_metadata() {
     let table_path = table_dir.to_str().unwrap().to_string();
 
     let ctx = SessionContext::new();
+
+    // explicitly disable schema clearing
+    ctx.config_options()

Review Comment:
   Ideally we will have a nicer API for this: https://github.com/apache/arrow-datafusion/issues/3908



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org