You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by gmail <ju...@gmail.com> on 2015/10/06 12:13:39 UTC

Yarn doesn't start mappers fast enough

Hallo everyone,

I have a problem with my yarn setup and hope you can help me. I already searched for this issue but didn't find anything.

My problem is that yarn doesn’t start new mappers fast enough. This results in a poor cluster utilization.

Setup:
 - 8 nodes @64cores+128GB
 - Hadoop version: Hadoop 2.6.0,
 - Standard Terasort of 100GB, input data generated by teragen with two mappers

What I see: At most ~40 mappers run at the same time. It looks like the rate of starting new mappers and the finishing rate is about the same at that point. The avg. processing time of each mapper is about 34-40s. If I start a second Terasort at the same time, it also  only runs up to ~40 mappers. It seems that 1) yarn correctly detects that it can run more but 2) doesn't start new mappers fast enough (1 at a time?).
What I expect: better utilization of all nodes since there are 300+ map jobs.

Are there parameters to change this behavior? How can I tell yarn to start more instances at the same time?

for completeness:
    - the behavior doesn't change if I use more mappers during teragen.
    - the bahavior doesn't change if I modify the number of nodes.
    - I recompiled Hadoop for 64bit according to https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html <https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html>
    - I use an GPFS as backend with the IBM gpfs-connector.

Thanks in advance,
Jürgen