You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ag...@apache.org on 2022/12/15 05:14:36 UTC
[arrow-datafusion] branch master updated: fix config descriptions for OPT_COLLECT_STATISTICS and OPT_REPARTITION_WINDOWS (#4623)

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
     new 3611d911a fix config descriptions for OPT_COLLECT_STATISTICS and OPT_REPARTITION_WINDOWS (#4623)
3611d911a is described below

commit 3611d911a3c9f3740bb1fc0527198be39ff47bfd
Author: Andy Grove <an...@gmail.com>
AuthorDate: Wed Dec 14 22:14:31 2022 -0700

    fix config descriptions for OPT_COLLECT_STATISTICS and OPT_REPARTITION_WINDOWS (#4623)
---
 datafusion/core/src/config.rs     | 6 +++---
 docs/source/user-guide/configs.md | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/datafusion/core/src/config.rs b/datafusion/core/src/config.rs
index 091721554..1c98c83ca 100644
--- a/datafusion/core/src/config.rs
+++ b/datafusion/core/src/config.rs
@@ -273,14 +273,14 @@ impl BuiltInConfigs {
 
             ConfigDefinition::new_bool(
                 OPT_REPARTITION_WINDOWS,
-                "Should DataFusion collect statistics after listing files",
+                "Should DataFusion repartition data using the partitions keys to execute window \
+                 functions in parallel using the provided `target_partitions` level",
                 true
             ),
 
             ConfigDefinition::new_bool(
                 OPT_COLLECT_STATISTICS,
-                "Should DataFusion repartition data using the partitions keys to execute window \
-                 functions in parallel using the provided `target_partitions` level",
+                "Should DataFusion collect statistics after listing files",
                 false
             ),
 
diff --git a/docs/source/user-guide/configs.md b/docs/source/user-guide/configs.md
index 81b1ef20a..039981338 100644
--- a/docs/source/user-guide/configs.md
+++ b/docs/source/user-guide/configs.md
@@ -44,7 +44,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
 | datafusion.execution.batch_size                           | UInt64  | 8192    | Default batch size while creating new batches, it's especially useful for buffer-in-memory batches since creating tiny batches would results in too much metadata memory consumption.                                                                                                                                                                         |
 | datafusion.execution.coalesce_batches                     | Boolean | true    | When set to true, record batches will be examined between each operator and small batches will be coalesced into larger batches. This is helpful when there are highly selective filters or joins that could produce tiny output batches. The target batch size is determined by the configuration setting 'datafusion.execution.coalesce_target_batch_size'. |
 | datafusion.execution.coalesce_target_batch_size           | UInt64  | 4096    | Target batch size when coalescing batches. Uses in conjunction with the configuration setting 'datafusion.execution.coalesce_batches'.                                                                                                                                                                                                                        |
-| datafusion.execution.collect_statistics                   | Boolean | false   | Should DataFusion repartition data using the partitions keys to execute window functions in parallel using the provided `target_partitions` level                                                                                                                                                                                                             |
+| datafusion.execution.collect_statistics                   | Boolean | false   | Should DataFusion collect statistics after listing files                                                                                                                                                                                                                                                                                                      |
 | datafusion.execution.parquet.enable_page_index            | Boolean | false   | If true, uses parquet data page level metadata (Page Index) statistics to reduce the number of rows decoded.                                                                                                                                                                                                                                                  |
 | datafusion.execution.parquet.metadata_size_hint           | UInt64  | NULL    | If specified, the parquet reader will try and fetch the last `size_hint` bytes of the parquet file optimistically. If not specified, two read are required: One read to fetch the 8-byte parquet footer and another to fetch the metadata length encoded in the footer.                                                                                       |
 | datafusion.execution.parquet.pruning                      | Boolean | true    | If true, the parquet reader attempts to skip entire row groups based on the predicate in the query and the metadata (min/max values) stored in the parquet file.                                                                                                                                                                                              |
@@ -62,6 +62,6 @@ Environment variables are read during `SessionConfig` initialisation so they mus
 | datafusion.optimizer.prefer_hash_join                     | Boolean | true    | When set to true, the physical plan optimizer will prefer HashJoin over SortMergeJoin. HashJoin can work more efficientlythan SortMergeJoin but consumes more memory. Defaults to true                                                                                                                                                                        |
 | datafusion.optimizer.repartition_aggregations             | Boolean | true    | Should DataFusion repartition data using the aggregate keys to execute aggregates in parallel using the provided `target_partitions` level                                                                                                                                                                                                                    |
 | datafusion.optimizer.repartition_joins                    | Boolean | true    | Should DataFusion repartition data using the join keys to execute joins in parallel using the provided `target_partitions` level                                                                                                                                                                                                                              |
-| datafusion.optimizer.repartition_windows                  | Boolean | true    | Should DataFusion collect statistics after listing files                                                                                                                                                                                                                                                                                                      |
+| datafusion.optimizer.repartition_windows                  | Boolean | true    | Should DataFusion repartition data using the partitions keys to execute window functions in parallel using the provided `target_partitions` level                                                                                                                                                                                                             |
 | datafusion.optimizer.skip_failed_rules                    | Boolean | true    | When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail.                                                                                                                         |
 | datafusion.optimizer.top_down_join_key_reordering         | Boolean | true    | When set to true, the physical plan optimizer will run a top down process to reorder the join keys. Defaults to true                                                                                                                                                                                                                                          |