You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2021/08/05 01:17:56 UTC
[spark] branch branch-3.2 updated: [SPARK-36384][CORE][DOC] Add doc
for shuffle checksum
This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.2 by this push:
new cc2a5ab [SPARK-36384][CORE][DOC] Add doc for shuffle checksum
cc2a5ab is described below
commit cc2a5abf7d56192704cf5c8f1bee0b07620c89e4
Author: yi.wu <yi...@databricks.com>
AuthorDate: Thu Aug 5 10:16:46 2021 +0900
[SPARK-36384][CORE][DOC] Add doc for shuffle checksum
### What changes were proposed in this pull request?
Add doc for the shuffle checksum configs in `configuration.md`.
### Why are the changes needed?
doc
### Does this PR introduce _any_ user-facing change?
No, since Spark 3.2 hasn't been released.
### How was this patch tested?
Pass existed tests.
Closes #33637 from Ngone51/SPARK-36384.
Authored-by: yi.wu <yi...@databricks.com>
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
(cherry picked from commit 3b92c721b5c08c76c3aad056d3170553d0b52f85)
Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
.../org/apache/spark/internal/config/package.scala | 13 ++++++++-----
docs/configuration.md | 18 ++++++++++++++++++
2 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 60ba3aa..17c585d 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1370,21 +1370,24 @@ package object config {
private[spark] val SHUFFLE_CHECKSUM_ENABLED =
ConfigBuilder("spark.shuffle.checksum.enabled")
- .doc("Whether to calculate the checksum of shuffle output. If enabled, Spark will try " +
- "its best to tell if shuffle data corruption is caused by network or disk or others.")
+ .doc("Whether to calculate the checksum of shuffle data. If enabled, Spark will calculate " +
+ "the checksum values for each partition data within the map output file and store the " +
+ "values in a checksum file on the disk. When there's shuffle data corruption detected, " +
+ "Spark will try to diagnose the cause (e.g., network issue, disk issue, etc.) of the " +
+ "corruption by using the checksum file.")
.version("3.2.0")
.booleanConf
.createWithDefault(true)
private[spark] val SHUFFLE_CHECKSUM_ALGORITHM =
ConfigBuilder("spark.shuffle.checksum.algorithm")
- .doc("The algorithm used to calculate the checksum. Currently, it only supports" +
- " built-in algorithms of JDK.")
+ .doc("The algorithm is used to calculate the shuffle checksum. Currently, it only supports " +
+ "built-in algorithms of JDK.")
.version("3.2.0")
.stringConf
.transform(_.toUpperCase(Locale.ROOT))
.checkValue(Set("ADLER32", "CRC32").contains, "Shuffle checksum algorithm " +
- "should be either Adler32 or CRC32.")
+ "should be either ADLER32 or CRC32.")
.createWithDefault("ADLER32")
private[spark] val SHUFFLE_COMPRESS =
diff --git a/docs/configuration.md b/docs/configuration.md
index f7db4c2..a4fdc4c 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1032,6 +1032,24 @@ Apart from these, the following properties are also available, and may be useful
</td>
<td>1.6.0</td>
</tr>
+<tr>
+ <td><code>spark.shuffle.checksum.enabled</code></td>
+ <td>true</td>
+ <td>
+ Whether to calculate the checksum of shuffle data. If enabled, Spark will calculate the checksum values for each partition
+ data within the map output file and store the values in a checksum file on the disk. When there's shuffle data corruption
+ detected, Spark will try to diagnose the cause (e.g., network issue, disk issue, etc.) of the corruption by using the checksum file.
+ </td>
+ <td>3.2.0</td>
+</tr>
+<tr>
+ <td><code>spark.shuffle.checksum.algorithm</code></td>
+ <td>ADLER32</td>
+ <td>
+ The algorithm is used to calculate the shuffle checksum. Currently, it only supports built-in algorithms of JDK, e.g., ADLER32, CRC32.
+ </td>
+ <td>3.2.0</td>
+</tr>
</table>
### Spark UI
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org