You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/06 13:05:46 UTC

[GitHub] [spark] dcoliversun opened a new pull request, #38131: [SPARK-40675][DOCS] Add missing spark configuration to configuration.md

dcoliversun opened a new pull request, #38131:
URL: https://github.com/apache/spark/pull/38131

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Supplement missing spark configurations in `configuration.md` (part 1)

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989548455


##########
docs/configuration.md:
##########
@@ -847,6 +911,14 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.1.0</td>
 </tr>
+<tr>
+  <td><code>spark.plugins</code></td>

Review Comment:
   Yeah, I see `spark.plugins` already mentioned at `monitoring.md`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] srowen closed pull request #38131: [SPARK-40675][DOCS] Supplement undocumented spark configurations in `configuration.md`

Posted by GitBox <gi...@apache.org>.
srowen closed pull request #38131: [SPARK-40675][DOCS] Supplement undocumented spark configurations in `configuration.md`
URL: https://github.com/apache/spark/pull/38131


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Supplement missing spark configurations in `configuration.md` (part 1)

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989249282


##########
docs/configuration.md:
##########
@@ -468,6 +485,43 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.decommission.enabled</code></td>

Review Comment:
   Oh, I didn't realize that this is still undocumented. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on pull request #38131: [SPARK-40675][DOCS] Supplement undocumented spark configurations in `configuration.md`

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on PR #38131:
URL: https://github.com/apache/spark/pull/38131#issuecomment-1272687515

   Thanks for your help @srowen @HyukjinKwon @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Supplement missing spark configurations in `configuration.md` (part 1)

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989549303


##########
docs/configuration.md:
##########
@@ -349,6 +349,23 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.executor.allowSparkContext</code></td>

Review Comment:
   OK. Better to mark as internal configuration? If so, I will make new PR to solve it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Add missing spark configuration to configuration.md

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989031705


##########
docs/configuration.md:
##########
@@ -1993,6 +1995,24 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.eventLog.gcMetrics.youngGenerationGarbageCollectors</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L195-L203



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Add missing spark configuration to configuration.md

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989182776


##########
docs/configuration.md:
##########
@@ -349,6 +349,23 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.executor.allowSparkContext</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2206-L2211



##########
docs/configuration.md:
##########
@@ -468,6 +485,43 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.decommission.enabled</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2133-L2144



##########
docs/configuration.md:
##########
@@ -847,6 +911,14 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.1.0</td>
 </tr>
+<tr>
+  <td><code>spark.plugins</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1715-L1723



##########
docs/configuration.md:
##########
@@ -1028,6 +1128,14 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.1.1</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.sort.io.plugin.class</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1320-L1325



##########
docs/configuration.md:
##########
@@ -1102,6 +1262,22 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.service.db.enabled</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L710-L716



##########
docs/configuration.md:
##########
@@ -1063,6 +1171,58 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.3.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.reduceLocality.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to compute locality preferences for reduce tasks.
+  </td>
+  <td>1.5.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.mapOutput.minSizeForBroadcast</code></td>
+  <td>512k</td>
+  <td>
+    The size at which we use Broadcast to send the map output statuses to the executors.
+  </td>
+  <td>2.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.detectCorrupt</code></td>
+  <td>true</td>
+  <td>
+    Whether to detect any corruption in fetched blocks.
+  </td>
+  <td>2.2.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.detectCorrupt.useExtraMemory</code></td>
+  <td>false</td>
+  <td>
+    If enabled, part of a compressed/encrypted stream will be de-compressed/de-crypted by using extra memory 
+    to detect early corruption. Any IOException thrown will cause the task to be retried once 
+    and if it fails again with same exception, then FetchFailedException will be thrown to retry previous stage.
+  </td>
+  <td>3.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.useOldFetchProtocol</code></td>
+  <td>false</td>
+  <td>
+    Whether to use the old protocol while doing the shuffle block fetching. It is only enabled while we need the 
+    compatibility in the scenario of new Spark version job fetching shuffle blocks from old version external shuffle service.
+  </td>
+  <td>3.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.readHostLocalDisk</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1611-L1618



##########
docs/configuration.md:
##########
@@ -468,6 +485,43 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    When decommission enabled, Spark will try its best to shut down the executor gracefully. 
+    Spark will try to migrate all the RDD blocks (controlled by <code>spark.storage.decommission.rddBlocks.enabled</code>)
+    and shuffle blocks (controlled by <code>spark.storage.decommission.shuffleBlocks.enabled</code>) from the decommissioning 
+    executor to a remote executor when <code>spark.storage.decommission.enabled</code> is enabled. 
+    With decommission enabled, Spark will also decommission an executor instead of killing when <code>spark.dynamicAllocation.enabled</code> enabled.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.decommission.killInterval</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2146-L2156



##########
docs/configuration.md:
##########
@@ -1891,6 +2093,24 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.files.ignoreCorruptFiles</code></td>
+  <td>false</td>
+  <td>
+    Whether to ignore corrupt files. If true, the Spark jobs will continue to run when encountering corrupted or 
+    non-existing files and contents that have been read will still be returned.
+  </td>
+  <td>2.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.files.ignoreMissingFiles</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1081-L1086



##########
docs/configuration.md:
##########
@@ -1944,6 +2164,67 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.9.2</td>
 </tr>
+<tr>
+  <td><code>spark.storage.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    Whether to decommission the block manager when decommissioning executor.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer shuffle blocks during block manager decommissioning. Requires a migratable shuffle resolver 
+    (like sort based shuffle).
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.maxThreads</code></td>
+  <td>8</td>
+  <td>
+    Maximum number of threads to use in migrating shuffle files.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.rddBlocks.enabled</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L474-L479



##########
docs/configuration.md:
##########
@@ -1944,6 +2164,67 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.9.2</td>
 </tr>
+<tr>
+  <td><code>spark.storage.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    Whether to decommission the block manager when decommissioning executor.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer shuffle blocks during block manager decommissioning. Requires a migratable shuffle resolver 
+    (like sort based shuffle).
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.maxThreads</code></td>
+  <td>8</td>
+  <td>
+    Maximum number of threads to use in migrating shuffle files.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.rddBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer RDD blocks during block manager decommissioning.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.fallbackStorage.path</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L502-L510



##########
docs/configuration.md:
##########
@@ -2321,6 +2630,16 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.4.1</td>
 </tr>
+<tr>
+  <td><code>spark.standalone.submit.waitAppCompletion</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2197-L2204



##########
docs/configuration.md:
##########
@@ -3360,6 +3688,15 @@ Push-based shuffle helps improve the reliability and performance of spark shuffl
   </td>
   <td>3.2.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.push.merge.finalizeThreads</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2330-L2338



##########
docs/configuration.md:
##########
@@ -1944,6 +2164,67 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.9.2</td>
 </tr>
+<tr>
+  <td><code>spark.storage.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    Whether to decommission the block manager when decommissioning executor.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer shuffle blocks during block manager decommissioning. Requires a migratable shuffle resolver 
+    (like sort based shuffle).
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.maxThreads</code></td>
+  <td>8</td>
+  <td>
+    Maximum number of threads to use in migrating shuffle files.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.rddBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer RDD blocks during block manager decommissioning.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.fallbackStorage.path</code></td>
+  <td>(none)</td>
+  <td>
+    The location for fallback storage during block manager decommissioning. For example, <code>s3a://spark-storage/</code>. 
+    In case of empty, fallback storage is disabled. The storage should be managed by TTL because Spark will not clean it up.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.fallbackStorage.cleanUp</code></td>
+  <td>false</td>
+  <td>
+    If true, Spark cleans up its fallback storage data during shutting down.
+  </td>
+  <td>3.2.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.maxDiskSize</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L519-L528



##########
docs/configuration.md:
##########
@@ -468,6 +485,43 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    When decommission enabled, Spark will try its best to shut down the executor gracefully. 
+    Spark will try to migrate all the RDD blocks (controlled by <code>spark.storage.decommission.rddBlocks.enabled</code>)
+    and shuffle blocks (controlled by <code>spark.storage.decommission.shuffleBlocks.enabled</code>) from the decommissioning 
+    executor to a remote executor when <code>spark.storage.decommission.enabled</code> is enabled. 
+    With decommission enabled, Spark will also decommission an executor instead of killing when <code>spark.dynamicAllocation.enabled</code> enabled.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.decommission.killInterval</code></td>
+  <td>(none)</td>
+  <td>
+    Duration after which a decommissioned executor will be killed forcefully by an outside (e.g. non-spark) service.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.decommission.forceKillTimeout</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2158-L2165



##########
docs/configuration.md:
##########
@@ -681,14 +735,24 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td><code>spark.redaction.regex</code></td>
-  <td>(?i)secret|password|token</td>
+  <td>(?i)secret|password|token|access[.]key</td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1116-L1124



##########
docs/configuration.md:
##########
@@ -906,6 +978,23 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.4.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.unsafe.file.output.buffer</code></td>
+  <td>32k</td>
+  <td>
+    The file system for this buffer size after each partition is written in unsafe shuffle writer. 
+    In KiB unless otherwise specified.
+  </td>
+  <td>2.3.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.spill.diskWriteBufferSize</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1350-L1358



##########
docs/configuration.md:
##########
@@ -468,6 +485,43 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    When decommission enabled, Spark will try its best to shut down the executor gracefully. 
+    Spark will try to migrate all the RDD blocks (controlled by <code>spark.storage.decommission.rddBlocks.enabled</code>)
+    and shuffle blocks (controlled by <code>spark.storage.decommission.shuffleBlocks.enabled</code>) from the decommissioning 
+    executor to a remote executor when <code>spark.storage.decommission.enabled</code> is enabled. 
+    With decommission enabled, Spark will also decommission an executor instead of killing when <code>spark.dynamicAllocation.enabled</code> enabled.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.decommission.killInterval</code></td>
+  <td>(none)</td>
+  <td>
+    Duration after which a decommissioned executor will be killed forcefully by an outside (e.g. non-spark) service.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.decommission.forceKillTimeout</code></td>
+  <td>(none)</td>
+  <td>
+    Duration after which a Spark will force a decommissioning executor to exit. 
+    This should be set to a high value in most situations as low values will prevent block migrations from having enough time to complete.
+  </td>
+  <td>3.2.0</td>
+</tr>
+<tr>
+  <td><code>spark.executor.decommission.signal</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2167-L2172



##########
docs/configuration.md:
##########
@@ -988,6 +1077,17 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.2.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.service.name</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L731-L739



##########
docs/configuration.md:
##########
@@ -1063,6 +1171,58 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.3.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.reduceLocality.enabled</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1534-L1539



##########
docs/configuration.md:
##########
@@ -681,14 +735,24 @@ Apart from these, the following properties are also available, and may be useful
 </tr>
 <tr>
   <td><code>spark.redaction.regex</code></td>
-  <td>(?i)secret|password|token</td>
+  <td>(?i)secret|password|token|access[.]key</td>
   <td>
     Regex to decide which Spark configuration properties and environment variables in driver and
     executor environments contain sensitive information. When this regex matches a property key or
     value, the value is redacted from the environment UI and various logs like YARN and event logs.
   </td>
   <td>2.1.2</td>
 </tr>
+<tr>
+  <td><code>spark.redaction.string.regex</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1126-L1133



##########
docs/configuration.md:
##########
@@ -906,6 +978,23 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.4.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.unsafe.file.output.buffer</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1339-L1348



##########
docs/configuration.md:
##########
@@ -1063,6 +1171,58 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.3.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.reduceLocality.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to compute locality preferences for reduce tasks.
+  </td>
+  <td>1.5.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.mapOutput.minSizeForBroadcast</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1541-L1546



##########
docs/configuration.md:
##########
@@ -1063,6 +1171,58 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.3.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.reduceLocality.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to compute locality preferences for reduce tasks.
+  </td>
+  <td>1.5.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.mapOutput.minSizeForBroadcast</code></td>
+  <td>512k</td>
+  <td>
+    The size at which we use Broadcast to send the map output statuses to the executors.
+  </td>
+  <td>2.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.detectCorrupt</code></td>
+  <td>true</td>
+  <td>
+    Whether to detect any corruption in fetched blocks.
+  </td>
+  <td>2.2.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.detectCorrupt.useExtraMemory</code></td>
+  <td>false</td>
+  <td>
+    If enabled, part of a compressed/encrypted stream will be de-compressed/de-crypted by using extra memory 
+    to detect early corruption. Any IOException thrown will cause the task to be retried once 
+    and if it fails again with same exception, then FetchFailedException will be thrown to retry previous stage.
+  </td>
+  <td>3.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.useOldFetchProtocol</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1602-L1609



##########
docs/configuration.md:
##########
@@ -1063,6 +1171,58 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.3.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.reduceLocality.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to compute locality preferences for reduce tasks.
+  </td>
+  <td>1.5.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.mapOutput.minSizeForBroadcast</code></td>
+  <td>512k</td>
+  <td>
+    The size at which we use Broadcast to send the map output statuses to the executors.
+  </td>
+  <td>2.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.detectCorrupt</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1554-L1559



##########
docs/configuration.md:
##########
@@ -1735,6 +1911,14 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.6.0</td>
 </tr>
+<tr>
+  <td><code>spark.storage.unrollMemoryThreshold</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L400-L405



##########
docs/configuration.md:
##########
@@ -1063,6 +1171,58 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.3.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.reduceLocality.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to compute locality preferences for reduce tasks.
+  </td>
+  <td>1.5.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.mapOutput.minSizeForBroadcast</code></td>
+  <td>512k</td>
+  <td>
+    The size at which we use Broadcast to send the map output statuses to the executors.
+  </td>
+  <td>2.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.detectCorrupt</code></td>
+  <td>true</td>
+  <td>
+    Whether to detect any corruption in fetched blocks.
+  </td>
+  <td>2.2.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.detectCorrupt.useExtraMemory</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1561-L1569



##########
docs/configuration.md:
##########
@@ -1102,6 +1262,22 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.service.db.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to use db in ExternalShuffleService. Note that this only affects standalone mode.
+  </td>
+  <td>3.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.shuffle.service.db.backend</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L718-L726



##########
docs/configuration.md:
##########
@@ -1944,6 +2164,67 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.9.2</td>
 </tr>
+<tr>
+  <td><code>spark.storage.decommission.enabled</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L451-L456



##########
docs/configuration.md:
##########
@@ -1816,6 +2010,14 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.1.1</td>
 </tr>
+<tr>
+  <td><code>spark.broadcast.UDFCompressionThreshold</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1934-L1941



##########
docs/configuration.md:
##########
@@ -1745,6 +1929,16 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.2.0</td>
 </tr>
+<tr>
+  <td><code>spark.storage.localDiskByExecutors.cacheSize</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1620-L1629



##########
docs/configuration.md:
##########
@@ -1891,6 +2093,24 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>1.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.files.ignoreCorruptFiles</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L1073-L1079



##########
docs/configuration.md:
##########
@@ -1944,6 +2164,67 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.9.2</td>
 </tr>
+<tr>
+  <td><code>spark.storage.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    Whether to decommission the block manager when decommissioning executor.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer shuffle blocks during block manager decommissioning. Requires a migratable shuffle resolver 
+    (like sort based shuffle).
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.maxThreads</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L466-L472



##########
docs/configuration.md:
##########
@@ -1944,6 +2164,67 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.9.2</td>
 </tr>
+<tr>
+  <td><code>spark.storage.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    Whether to decommission the block manager when decommissioning executor.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.enabled</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L458-L464



##########
docs/configuration.md:
##########
@@ -3342,6 +3661,15 @@ Push-based shuffle helps improve the reliability and performance of spark shuffl
   </td>
   <td>3.2.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.push.numPushThreads</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L2301-L2308



##########
docs/configuration.md:
##########
@@ -1944,6 +2164,67 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>0.9.2</td>
 </tr>
+<tr>
+  <td><code>spark.storage.decommission.enabled</code></td>
+  <td>false</td>
+  <td>
+    Whether to decommission the block manager when decommissioning executor.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer shuffle blocks during block manager decommissioning. Requires a migratable shuffle resolver 
+    (like sort based shuffle).
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.shuffleBlocks.maxThreads</code></td>
+  <td>8</td>
+  <td>
+    Maximum number of threads to use in migrating shuffle files.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.rddBlocks.enabled</code></td>
+  <td>true</td>
+  <td>
+    Whether to transfer RDD blocks during block manager decommissioning.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.fallbackStorage.path</code></td>
+  <td>(none)</td>
+  <td>
+    The location for fallback storage during block manager decommissioning. For example, <code>s3a://spark-storage/</code>. 
+    In case of empty, fallback storage is disabled. The storage should be managed by TTL because Spark will not clean it up.
+  </td>
+  <td>3.1.0</td>
+</tr>
+<tr>
+  <td><code>spark.storage.decommission.fallbackStorage.cleanUp</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L512-L517



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Supplement missing spark configurations in `configuration.md` (part 1)

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989248499


##########
docs/configuration.md:
##########
@@ -349,6 +349,23 @@ of the most common options to set are:
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.executor.allowSparkContext</code></td>

Review Comment:
   BTW, I guess we don't want to expose this, @dcoliversun . There is no good for users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Add missing spark configuration to configuration.md

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989032175


##########
docs/configuration.md:
##########
@@ -1993,6 +1995,24 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>3.0.0</td>
 </tr>
+<tr>
+  <td><code>spark.eventLog.gcMetrics.youngGenerationGarbageCollectors</code></td>
+  <td>Copy,PS Scavenge,ParNew,G1 Young Generation</td>
+  <td>
+    Names of supported young generation garbage collector. A name usually is the return of GarbageCollectorMXBean.getName.
+    The built-in young generation garbage collectors are Copy,PS Scavenge,ParNew,G1 Young Generation.
+  </td>
+  <td>3.0.0</td>
+</tr>
+<tr>
+  <td><code>spark.eventLog.gcMetrics.oldGenerationGarbageCollectors</code></td>

Review Comment:
   https://github.com/apache/spark/blob/22483167e20208e40e24abe6898b2102ddaf4fc9/core/src/main/scala/org/apache/spark/internal/config/package.scala#L205-L213



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38131: [SPARK-40675][DOCS] Supplement missing spark configurations in `configuration.md` (part 1)

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38131:
URL: https://github.com/apache/spark/pull/38131#issuecomment-1270724805

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dcoliversun commented on pull request #38131: [SPARK-40675][DOCS] Supplement missing spark configurations in `configuration.md` (part 1)

Posted by GitBox <gi...@apache.org>.
dcoliversun commented on PR #38131:
URL: https://github.com/apache/spark/pull/38131#issuecomment-1270301939

   cc @HyukjinKwon @dongjoon-hyun 
   It would be nice if you have a time to review this PR :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38131: [SPARK-40675][DOCS] Supplement missing spark configurations in `configuration.md` (part 1)

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on code in PR #38131:
URL: https://github.com/apache/spark/pull/38131#discussion_r989251034


##########
docs/configuration.md:
##########
@@ -847,6 +911,14 @@ Apart from these, the following properties are also available, and may be useful
   </td>
   <td>2.1.0</td>
 </tr>
+<tr>
+  <td><code>spark.plugins</code></td>

Review Comment:
   Yes, I agree to have this. It seems that we did mention only at `monitoring.md`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org