You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Artur Sukhenko <ar...@gmail.com> on 2016/11/15 20:45:43 UTC

NodeManager heap size with ExternalShuffleService

Hello guys,

When you enable ExternalShuffleService (spark-shuffle) in NodeManager,
there are no suggestions of increasing NM heap size in Spark docs or
anywhere else, shouldn't we include this in spark's documentation?

I have seen NM take a lot of memory 5+ gb with default 1g, and in case of
its GC pauses spark can become very slow when tasks are doing shuffle. I
don't think users are aware of NM becoming bottleneck.


Sincerely,
Artur Sukhenko
-- 
--
Artur Sukhenko

Re: NodeManager heap size with ExternalShuffleService

Posted by Artur Sukhenko <ar...@gmail.com>.
Sure Reynold,

Here is pull request - [YARN][DOC] Increasing NodeManager's heap size with
External Shuffle Service <https://github.com/apache/spark/pull/15906>

On Wed, Nov 16, 2016, 04:07 Reynold Xin <rx...@databricks.com> wrote:

Can you submit a pull request to add that to the documentation?


On November 15, 2016 at 12:45:57 PM, Artur Sukhenko (
artur.sukhenko@gmail.com) wrote:

Hello guys,

When you enable ExternalShuffleService (spark-shuffle) in NodeManager,
there are no suggestions of increasing NM heap size in Spark docs or
anywhere else, shouldn't we include this in spark's documentation?

I have seen NM take a lot of memory 5+ gb with default 1g, and in case of
its GC pauses spark can become very slow when tasks are doing shuffle. I
don't think users are aware of NM becoming bottleneck.


Sincerely,
Artur Sukhenko
--
--
Artur Sukhenko

-- 
--
Artur Sukhenko

Re: NodeManager heap size with ExternalShuffleService

Posted by Reynold Xin <rx...@databricks.com>.
Can you submit a pull request to add that to the documentation?


On November 15, 2016 at 12:45:57 PM, Artur Sukhenko (
artur.sukhenko@gmail.com) wrote:

Hello guys,

When you enable ExternalShuffleService (spark-shuffle) in NodeManager,
there are no suggestions of increasing NM heap size in Spark docs or
anywhere else, shouldn't we include this in spark's documentation?

I have seen NM take a lot of memory 5+ gb with default 1g, and in case of
its GC pauses spark can become very slow when tasks are doing shuffle. I
don't think users are aware of NM becoming bottleneck.


Sincerely,
Artur Sukhenko
--
--
Artur Sukhenko