You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2022/08/10 19:11:35 UTC

[spark] branch master updated: [SPARK-39743][DOCS] Improve `spark.io.compression.(lz4|snappy|zstd)` conf scope description

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 25759a0de6d [SPARK-39743][DOCS] Improve `spark.io.compression.(lz4|snappy|zstd)` conf scope description
25759a0de6d is described below

commit 25759a0de6dd09ecc440d009fba6d661558e7261
Author: zzzzming95 <50...@qq.com>
AuthorDate: Wed Aug 10 12:10:54 2022 -0700

    [SPARK-39743][DOCS] Improve `spark.io.compression.(lz4|snappy|zstd)` conf scope description
    
    ### What changes were proposed in this pull request?
    
    Updated some `spark.io.compression` configuration descriptions to clarify parameter application scope.
    
    ### Why are the changes needed?
    
    Users are easily confused and mistakenly think that these parameters can be applied to sparksql.
    
    For example, the user wants to write a parquet file and set some compression configurations such as zstd level (as the following code). In the original document, user found that there is a `spark.io.compression.zstd.level` in the relevant configuration, which seems to be the desired configuration, but after using it, it is found that it does not take effect, which will make users confused.
    
    ```
        sparkSession.sessionState.newHadoopConfWithOptions(
          "spark.io.compression.zstd.level" -> "10"
        )
    
        df.coalesce(1).write.parquet("file:///home/test_data/nn_parq_10")
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    Closes #37416 from zzzzming95/SPARK-39743-doc.
    
    Authored-by: zzzzming95 <50...@qq.com>
    Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
 docs/configuration.md | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/docs/configuration.md b/docs/configuration.md
index 957c430c37b..55e595ad301 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1530,7 +1530,8 @@ Apart from these, the following properties are also available, and may be useful
   <td>
     Block size used in LZ4 compression, in the case when LZ4 compression codec
     is used. Lowering this block size will also lower shuffle memory usage when LZ4 is used.
-    Default unit is bytes, unless otherwise specified.
+    Default unit is bytes, unless otherwise specified. This configuration only applies to
+    `spark.io.compression.codec`.
   </td>
   <td>1.4.0</td>
 </tr>
@@ -1540,7 +1541,8 @@ Apart from these, the following properties are also available, and may be useful
   <td>
     Block size in Snappy compression, in the case when Snappy compression codec is used. 
     Lowering this block size will also lower shuffle memory usage when Snappy is used.
-    Default unit is bytes, unless otherwise specified.
+    Default unit is bytes, unless otherwise specified. This configuration only applies
+    to `spark.io.compression.codec`.
   </td>
   <td>1.4.0</td>
 </tr>
@@ -1549,7 +1551,8 @@ Apart from these, the following properties are also available, and may be useful
   <td>1</td>
   <td>
     Compression level for Zstd compression codec. Increasing the compression level will result in better
-    compression at the expense of more CPU and memory.
+    compression at the expense of more CPU and memory. This configuration only applies to 
+    `spark.io.compression.codec`.
   </td>
   <td>2.3.0</td>
 </tr>
@@ -1559,7 +1562,8 @@ Apart from these, the following properties are also available, and may be useful
   <td>
     Buffer size in bytes used in Zstd compression, in the case when Zstd compression codec
     is used. Lowering this size will lower the shuffle memory usage when Zstd is used, but it
-    might increase the compression cost because of excessive JNI call overhead.
+    might increase the compression cost because of excessive JNI call overhead. This
+    configuration only applies to `spark.io.compression.codec`.
   </td>
   <td>2.3.0</td>
 </tr>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org