You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by gi...@apache.org on 2024/01/09 20:35:40 UTC

(arrow-datafusion) branch asf-site updated: Publish built docs triggered by be8a9536e6f8c7bbebd0e991901bf6acb22ec133

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 435cc008c2 Publish built docs triggered by be8a9536e6f8c7bbebd0e991901bf6acb22ec133
435cc008c2 is described below

commit 435cc008c271ae6bf2878fc3630dc785f67ceb90
Author: github-actions[bot] <gi...@users.noreply.github.com>
AuthorDate: Tue Jan 9 20:35:35 2024 +0000

    Publish built docs triggered by be8a9536e6f8c7bbebd0e991901bf6acb22ec133
---
 _sources/user-guide/configs.md.txt           |  2 +-
 _sources/user-guide/sql/write_options.md.txt | 36 ++++++++++++++--------------
 searchindex.js                               |  2 +-
 user-guide/configs.html                      |  2 +-
 user-guide/sql/write_options.html            |  2 +-
 5 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/_sources/user-guide/configs.md.txt b/_sources/user-guide/configs.md.txt
index 0a5c221c50..4a379d374c 100644
--- a/_sources/user-guide/configs.md.txt
+++ b/_sources/user-guide/configs.md.txt
@@ -63,7 +63,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
 | datafusion.execution.parquet.dictionary_page_size_limit                 | 1048576                   | Sets best effort maximum dictionary page size, in bytes                                                                                                                                                                                                                                                                                                                                              [...]
 | datafusion.execution.parquet.statistics_enabled                         | NULL                      | Sets if statistics are enabled for any column Valid values are: "none", "chunk", and "page" These values are not case sensitive. If NULL, uses default parquet writer setting                                                                                                                                                                                                                        [...]
 | datafusion.execution.parquet.max_statistics_size                        | NULL                      | Sets max statistics size for any column. If NULL, uses default parquet writer setting                                                                                                                                                                                                                                                                                                                [...]
-| datafusion.execution.parquet.max_row_group_size                         | 1048576                   | Sets maximum number of rows in a row group                                                                                                                                                                                                                                                                                                                                                           [...]
+| datafusion.execution.parquet.max_row_group_size                         | 1048576                   | Target maximum number of rows in each row group (defaults to 1M rows). Writing larger row groups requires more memory to write, but can get better compression and be faster to read.                                                                                                                                                                                                                [...]
 | datafusion.execution.parquet.created_by                                 | datafusion version 34.0.0 | Sets "created by" property                                                                                                                                                                                                                                                                                                                                                                           [...]
 | datafusion.execution.parquet.column_index_truncate_length               | NULL                      | Sets column index truncate length                                                                                                                                                                                                                                                                                                                                                                    [...]
 | datafusion.execution.parquet.data_page_row_count_limit                  | 18446744073709551615      | Sets best effort maximum number of rows in data page                                                                                                                                                                                                                                                                                                                                                 [...]
diff --git a/_sources/user-guide/sql/write_options.md.txt b/_sources/user-guide/sql/write_options.md.txt
index 470591afaf..5321d11fcb 100644
--- a/_sources/user-guide/sql/write_options.md.txt
+++ b/_sources/user-guide/sql/write_options.md.txt
@@ -100,21 +100,21 @@ The following options are available when writing CSV files. Note: if any unsuppo
 
 The following options are available when writing parquet files. If any unsupported option is specified an error will be raised and the query will fail. If a column specific option is specified for a column which does not exist, the option will be ignored without error. For default values, see: [Configuration Settings](https://arrow.apache.org/datafusion/user-guide/configs.html).
 
-| Option                       | Can be Column Specific? | Description                                                                                                   |
-| ---------------------------- | ----------------------- | ------------------------------------------------------------------------------------------------------------- |
-| COMPRESSION                  | Yes                     | Sets the compression codec and if applicable compression level to use                                         |
-| MAX_ROW_GROUP_SIZE           | No                      | Sets the maximum number of rows that can be encoded in a single row group                                     |
-| DATA_PAGESIZE_LIMIT          | No                      | Sets the best effort maximum page size in bytes                                                               |
-| WRITE_BATCH_SIZE             | No                      | Maximum number of rows written for each column in a single batch                                              |
-| WRITER_VERSION               | No                      | Parquet writer version (1.0 or 2.0)                                                                           |
-| DICTIONARY_PAGE_SIZE_LIMIT   | No                      | Sets best effort maximum dictionary page size in bytes                                                        |
-| CREATED_BY                   | No                      | Sets the "created by" property in the parquet file                                                            |
-| COLUMN_INDEX_TRUNCATE_LENGTH | No                      | Sets the max length of min/max value fields in the column index.                                              |
-| DATA_PAGE_ROW_COUNT_LIMIT    | No                      | Sets best effort maximum number of rows in a data page.                                                       |
-| BLOOM_FILTER_ENABLED         | Yes                     | Sets whether a bloom filter should be written into the file.                                                  |
-| ENCODING                     | Yes                     | Sets the encoding that should be used (e.g. PLAIN or RLE)                                                     |
-| DICTIONARY_ENABLED           | Yes                     | Sets if dictionary encoding is enabled. Use this instead of ENCODING to set dictionary encoding.              |
-| STATISTICS_ENABLED           | Yes                     | Sets if statistics are enabled at PAGE or ROW_GROUP level.                                                    |
-| MAX_STATISTICS_SIZE          | Yes                     | Sets the maximum size in bytes that statistics can take up.                                                   |
-| BLOOM_FILTER_FPP             | Yes                     | Sets the false positive probability (fpp) for the bloom filter. Implicitly sets BLOOM_FILTER_ENABLED to true. |
-| BLOOM_FILTER_NDV             | Yes                     | Sets the number of distinct values (ndv) for the bloom filter. Implicitly sets bloom_filter_enabled to true.  |
+| Option                       | Can be Column Specific? | Description                                                                                                                         |
+| ---------------------------- | ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
+| COMPRESSION                  | Yes                     | Sets the compression codec and if applicable compression level to use                                                               |
+| MAX_ROW_GROUP_SIZE           | No                      | Sets the maximum number of rows that can be encoded in a single row group. Larger row groups require more memory to write and read. |
+| DATA_PAGESIZE_LIMIT          | No                      | Sets the best effort maximum page size in bytes                                                                                     |
+| WRITE_BATCH_SIZE             | No                      | Maximum number of rows written for each column in a single batch                                                                    |
+| WRITER_VERSION               | No                      | Parquet writer version (1.0 or 2.0)                                                                                                 |
+| DICTIONARY_PAGE_SIZE_LIMIT   | No                      | Sets best effort maximum dictionary page size in bytes                                                                              |
+| CREATED_BY                   | No                      | Sets the "created by" property in the parquet file                                                                                  |
+| COLUMN_INDEX_TRUNCATE_LENGTH | No                      | Sets the max length of min/max value fields in the column index.                                                                    |
+| DATA_PAGE_ROW_COUNT_LIMIT    | No                      | Sets best effort maximum number of rows in a data page.                                                                             |
+| BLOOM_FILTER_ENABLED         | Yes                     | Sets whether a bloom filter should be written into the file.                                                                        |
+| ENCODING                     | Yes                     | Sets the encoding that should be used (e.g. PLAIN or RLE)                                                                           |
+| DICTIONARY_ENABLED           | Yes                     | Sets if dictionary encoding is enabled. Use this instead of ENCODING to set dictionary encoding.                                    |
+| STATISTICS_ENABLED           | Yes                     | Sets if statistics are enabled at PAGE or ROW_GROUP level.                                                                          |
+| MAX_STATISTICS_SIZE          | Yes                     | Sets the maximum size in bytes that statistics can take up.                                                                         |
+| BLOOM_FILTER_FPP             | Yes                     | Sets the false positive probability (fpp) for the bloom filter. Implicitly sets BLOOM_FILTER_ENABLED to true.                       |
+| BLOOM_FILTER_NDV             | Yes                     | Sets the number of distinct values (ndv) for the bloom filter. Implicitly sets bloom_filter_enabled to true.                        |
diff --git a/searchindex.js b/searchindex.js
index 7d49b858da..c0da25dfa7 100644
--- a/searchindex.js
+++ b/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "library-user-guide/adding-udfs", "library-user-guide/building-logical-plans", "library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
+Search.setIndex({"docnames": ["contributor-guide/architecture", "contributor-guide/communication", "contributor-guide/index", "contributor-guide/quarterly_roadmap", "contributor-guide/roadmap", "contributor-guide/specification/index", "contributor-guide/specification/invariants", "contributor-guide/specification/output-field-name-semantic", "index", "library-user-guide/adding-udfs", "library-user-guide/building-logical-plans", "library-user-guide/catalogs", "library-user-guide/custom-tab [...]
\ No newline at end of file
diff --git a/user-guide/configs.html b/user-guide/configs.html
index 09ba59582d..0ffcf9569e 100644
--- a/user-guide/configs.html
+++ b/user-guide/configs.html
@@ -522,7 +522,7 @@ Environment variables are read during <code class="docutils literal notranslate"
 </tr>
 <tr class="row-even"><td><p>datafusion.execution.parquet.max_row_group_size</p></td>
 <td><p>1048576</p></td>
-<td><p>Sets maximum number of rows in a row group</p></td>
+<td><p>Target maximum number of rows in each row group (defaults to 1M rows). Writing larger row groups requires more memory to write, but can get better compression and be faster to read.</p></td>
 </tr>
 <tr class="row-odd"><td><p>datafusion.execution.parquet.created_by</p></td>
 <td><p>datafusion version 34.0.0</p></td>
diff --git a/user-guide/sql/write_options.html b/user-guide/sql/write_options.html
index b8b29901dd..3912cc7c06 100644
--- a/user-guide/sql/write_options.html
+++ b/user-guide/sql/write_options.html
@@ -559,7 +559,7 @@
 </tr>
 <tr class="row-odd"><td><p>MAX_ROW_GROUP_SIZE</p></td>
 <td><p>No</p></td>
-<td><p>Sets the maximum number of rows that can be encoded in a single row group</p></td>
+<td><p>Sets the maximum number of rows that can be encoded in a single row group. Larger row groups require more memory to write and read.</p></td>
 </tr>
 <tr class="row-even"><td><p>DATA_PAGESIZE_LIMIT</p></td>
 <td><p>No</p></td>