You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by gi...@apache.org on 2023/07/12 19:45:15 UTC

[arrow-datafusion] branch asf-site updated: Publish built docs triggered by ad3b8f6e46d09fddecaa347e013c2913a7196d04

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new b8a38c7248 Publish built docs triggered by ad3b8f6e46d09fddecaa347e013c2913a7196d04
b8a38c7248 is described below

commit b8a38c72488818609f13a6e9b36ee80bcab11fcc
Author: github-actions[bot] <gi...@users.noreply.github.com>
AuthorDate: Wed Jul 12 19:45:09 2023 +0000

    Publish built docs triggered by ad3b8f6e46d09fddecaa347e013c2913a7196d04
---
 _sources/user-guide/configs.md.txt | 2 +-
 searchindex.js                     | 2 +-
 user-guide/configs.html            | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/_sources/user-guide/configs.md.txt b/_sources/user-guide/configs.md.txt
index 32001b9664..4229e3af70 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -63,7 +63,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
 | datafusion.optimizer.repartition_file_min_size             | 10485760   | Minimum total files size in bytes to perform file scan repartitioning.                                                                                                                                                                                                                                                                                                                                                           [...]
 | datafusion.optimizer.repartition_joins                     | true       | Should DataFusion repartition data using the join keys to execute joins in parallel using the provided `target_partitions` level                                                                                                                                                                                                                                                                                                 [...]
 | datafusion.optimizer.allow_symmetric_joins_without_pruning | true       | Should DataFusion allow symmetric hash joins for unbounded data sources even when its inputs do not have any ordering or filtering If the flag is not enabled, the SymmetricHashJoin operator will be unable to prune its internal buffers, resulting in certain join types - such as Full, Left, LeftAnti, LeftSemi, Right, RightAnti, and RightSemi - being produced only at the end of the execution. This is not typical in  [...]
-| datafusion.optimizer.repartition_file_scans                | true       | When set to true, file groups will be repartitioned to achieve maximum parallelism. Currently supported only for Parquet format in which case multiple row groups from the same file may be read concurrently. If false then each row group is read serially, though different files may be read in parallel.                                                                                                                    [...]
+| datafusion.optimizer.repartition_file_scans                | true       | When set to `true`, file groups will be repartitioned to achieve maximum parallelism. Currently Parquet and CSV formats are supported. If set to `true`, all files will be repartitioned evenly (i.e., a single large file might be partitioned into smaller chunks) for parallel scanning. If set to `false`, different files will be read in parallel, but repartitioning won't happen within a single file.                   [...]
 | datafusion.optimizer.repartition_windows                   | true       | Should DataFusion repartition data using the partitions keys to execute window functions in parallel using the provided `target_partitions` level                                                                                                                                                                                                                                                                                [...]
 | datafusion.optimizer.repartition_sorts                     | true       | Should DataFusion execute sorts in a per-partition fashion and merge afterwards instead of coalescing first and sorting globally. With this flag is enabled, plans in the form below `text "SortExec: [a@0 ASC]", " CoalescePartitionsExec", " RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=1", ` would turn into the plan below which performs better in multithreaded environments `text "SortPreserving [...]
 | datafusion.optimizer.skip_failed_rules                     | false      | When set to true, the logical plan optimizer will produce warning messages if any optimization rules produce errors and then proceed to the next rule. When set to false, any rules that produce errors will cause the query to fail                                                                                                                                                                                             [...]
diff --git a/searchindex.js b/searchindex.js
index 58d1be3d6c..f79b076f15 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "user-guide/cli", "user-guide/configs", "user-guide/dataframe", "user-guide/example-usage", "user-guide/expressions", "user-guide/faq", "use [...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "user-guide/cli", "user-guide/configs", "user-guide/dataframe", "user-guide/example-usage", "user-guide/expressions", "user-guide/faq", "use [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index c741a3cd9d..2dce0bb5ab 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -448,7 +448,7 @@ Environment variables are read during <code class="docutils literal notranslate"
 </tr>
 <tr class="row-even"><td><p>datafusion.optimizer.repartition_file_scans</p></td>
 <td><p>true</p></td>
-<td><p>When set to true, file groups will be repartitioned to achieve maximum parallelism. Currently supported only for Parquet format in which case multiple row groups from the same file may be read concurrently. If false then each row group is read serially, though different files may be read in parallel.</p></td>
+<td><p>When set to <code class="docutils literal notranslate"><span class="pre">true</span></code>, file groups will be repartitioned to achieve maximum parallelism. Currently Parquet and CSV formats are supported. If set to <code class="docutils literal notranslate"><span class="pre">true</span></code>, all files will be repartitioned evenly (i.e., a single large file might be partitioned into smaller chunks) for parallel scanning. If set to <code class="docutils literal notranslate"><s [...]
 </tr>
 <tr class="row-odd"><td><p>datafusion.optimizer.repartition_windows</p></td>
 <td><p>true</p></td>