You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by rx...@apache.org on 2016/11/16 23:08:06 UTC
spark git commit: [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service

Repository: spark
Updated Branches:
  refs/heads/master 2ca8ae9aa -> 55589987b


[YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service

## What changes were proposed in this pull request?

Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as
`NM` can spend a lot of time doing GC resulting in  shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up.
Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster performance will suffer.
I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on.

## How was this patch tested?

#### Added step 5:
![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png)

Author: Artur Sukhenko <ar...@gmail.com>

Closes #15906 from Devian-ua/nmHeapSize.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/55589987
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/55589987
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/55589987

Branch: refs/heads/master
Commit: 55589987be89ff78dadf44498352fbbd811a206e
Parents: 2ca8ae9
Author: Artur Sukhenko <ar...@gmail.com>
Authored: Wed Nov 16 15:08:01 2016 -0800
Committer: Reynold Xin <rx...@databricks.com>
Committed: Wed Nov 16 15:08:01 2016 -0800

----------------------------------------------------------------------
 docs/running-on-yarn.md | 2 ++
 1 file changed, 2 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/55589987/docs/running-on-yarn.md
----------------------------------------------------------------------
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index cd18808..fe0221c 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -559,6 +559,8 @@ pre-packaged distribution.
 1. In the `yarn-site.xml` on each node, add `spark_shuffle` to `yarn.nodemanager.aux-services`,
 then set `yarn.nodemanager.aux-services.spark_shuffle.class` to
 `org.apache.spark.network.yarn.YarnShuffleService`.
+1. Increase `NodeManager's` heap size by setting `YARN_HEAPSIZE` (1000 by default) in `etc/hadoop/yarn-env.sh` 
+to avoid garbage collection issues during shuffle. 
 1. Restart all `NodeManager`s in your cluster.
 
 The following extra configuration options are available when the shuffle service is running on YARN:


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org