You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by dh...@apache.org on 2023/09/30 05:23:02 UTC

[arrow-datafusion] branch main updated: Update Default Parquet Write Compression (#7692)

This is an automated email from the ASF dual-hosted git repository.

dheres pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new 692ea24357 Update Default Parquet Write Compression (#7692)
692ea24357 is described below

commit 692ea24357d32b1242c476f0ed33498c815ac921
Author: Devin D'Angelo <de...@gmail.com>
AuthorDate: Sat Sep 30 01:22:52 2023 -0400

    Update Default Parquet Write Compression (#7692)
    
    * update compression default
    
    * fix tests
    
    ---------
    
    Co-authored-by: Andrew Lamb <an...@nerdnetworks.org>
---
 datafusion/common/src/config.rs                           | 2 +-
 datafusion/sqllogictest/test_files/information_schema.slt | 2 +-
 docs/source/user-guide/configs.md                         | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/datafusion/common/src/config.rs b/datafusion/common/src/config.rs
index b34c64ff88..261c2bf435 100644
--- a/datafusion/common/src/config.rs
+++ b/datafusion/common/src/config.rs
@@ -307,7 +307,7 @@ config_namespace! {
         /// lzo, brotli(level), lz4, zstd(level), and lz4_raw.
         /// These values are not case sensitive. If NULL, uses
         /// default parquet writer setting
-        pub compression: Option<String>, default = None
+        pub compression: Option<String>, default = Some("zstd(3)".into())
 
         /// Sets if dictionary encoding is enabled. If NULL, uses
         /// default parquet writer setting
diff --git a/datafusion/sqllogictest/test_files/information_schema.slt b/datafusion/sqllogictest/test_files/information_schema.slt
index f909010216..12aa9089a0 100644
--- a/datafusion/sqllogictest/test_files/information_schema.slt
+++ b/datafusion/sqllogictest/test_files/information_schema.slt
@@ -156,7 +156,7 @@ datafusion.execution.parquet.bloom_filter_enabled false
 datafusion.execution.parquet.bloom_filter_fpp NULL
 datafusion.execution.parquet.bloom_filter_ndv NULL
 datafusion.execution.parquet.column_index_truncate_length NULL
-datafusion.execution.parquet.compression NULL
+datafusion.execution.parquet.compression zstd(3)
 datafusion.execution.parquet.created_by datafusion
 datafusion.execution.parquet.data_page_row_count_limit 18446744073709551615
 datafusion.execution.parquet.data_pagesize_limit 1048576
diff --git a/docs/source/user-guide/configs.md b/docs/source/user-guide/configs.md
index 7fe229b4d3..638ac5a36b 100644
--- a/docs/source/user-guide/configs.md
+++ b/docs/source/user-guide/configs.md
@@ -58,7 +58,7 @@ Environment variables are read during `SessionConfig` initialisation so they mus
 | datafusion.execution.parquet.data_pagesize_limit           | 1048576                   | Sets best effort maximum size of data page in bytes                                                                                                                                                                                                                                                                                                                                                               [...]
 | datafusion.execution.parquet.write_batch_size              | 1024                      | Sets write_batch_size in bytes                                                                                                                                                                                                                                                                                                                                                                                    [...]
 | datafusion.execution.parquet.writer_version                | 1.0                       | Sets parquet writer version valid values are "1.0" and "2.0"                                                                                                                                                                                                                                                                                                                                                      [...]
-| datafusion.execution.parquet.compression                   | NULL                      | Sets default parquet compression codec Valid values are: uncompressed, snappy, gzip(level), lzo, brotli(level), lz4, zstd(level), and lz4_raw. These values are not case sensitive. If NULL, uses default parquet writer setting                                                                                                                                                                                  [...]
+| datafusion.execution.parquet.compression                   | zstd(3)                   | Sets default parquet compression codec Valid values are: uncompressed, snappy, gzip(level), lzo, brotli(level), lz4, zstd(level), and lz4_raw. These values are not case sensitive. If NULL, uses default parquet writer setting                                                                                                                                                                                  [...]
 | datafusion.execution.parquet.dictionary_enabled            | NULL                      | Sets if dictionary encoding is enabled. If NULL, uses default parquet writer setting                                                                                                                                                                                                                                                                                                                              [...]
 | datafusion.execution.parquet.dictionary_page_size_limit    | 1048576                   | Sets best effort maximum dictionary page size, in bytes                                                                                                                                                                                                                                                                                                                                                           [...]
 | datafusion.execution.parquet.statistics_enabled            | NULL                      | Sets if statistics are enabled for any column Valid values are: "none", "chunk", and "page" These values are not case sensitive. If NULL, uses default parquet writer setting                                                                                                                                                                                                                                     [...]