You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Joseph Naegele <jn...@grierforensics.com> on 2016/04/14 01:40:30 UTC

YarnChild and Container running beyond physical memory limits

Hi!

 

Can anyone tell me what exactly YarnChild is and how I can control the
quantity of child JVMs running in each container? In this case I'm concerned
with the map phase of my MR job. I'm having issues with my containers
running beyond *physical* memory limits and I'm trying to determine the
cause.

 

Is each child JVM just an individual map task? If so, why do I see a
variable number of them? I don't know if each of these JVMs is a clone of
the original YarnChild process, what they are doing, why they are each using
so much memory (1G).

 

Here is a sample excerpt of my MR job when YARN kills a container:
https://gist.githubusercontent.com/naegelejd/ad3a58192a2df79775d80e3eac0ae49
c/raw/808f998b1987c77ba1fe7fb41abab62ae07c5e02/job.log

Here's the same process tree reorganized and ordered by ancestry:
https://gist.githubusercontent.com/naegelejd/37afb27a6cf16ce918daeaeaf7450cd
c/raw/b8809ce023840799f2cbbee28e49930671198ead/job.clean.log

 

If I increase the amount of memory per container, in turn lowering the total
number of containers, I see these errors less often as expected, BUT when I
do see them, there are NO child JVM processes and it's always due to some
other unrelated external process chewing up RAM. Here is an example of that:
https://gist.githubusercontent.com/naegelejd/32d63b0f9b9c148d1c1c7c0de3c2c31
7/raw/934a93a7afe09c7cd62a50edc08ce902b9e71aac/job.log. You can see that the
[redacted] process is the culprit in that case.

 

I can share my mapred/yarn configuration if it's helpful.

 

If anyone has any ideas I'd greatly appreciate them!

 

Thanks,

Joe

Re: YarnChild and Container running beyond physical memory limits

Posted by Varun Vasudev <vv...@apache.org>.

Hi Joseph,

YarnChild is a wrapper around the MR task process that actually carry out the work on the machine. From YarnChild.java -
/**
 * The main() for MapReduce task processes.
 */

In the snippets you provided, the memory monitor for YARN killed the map tasks because they exceeded the allocated memory - 
Container [pid=30518,containerID=container_1460573911020_0002_01_000033] is running beyond physical memory limits. Current usage: 6.6 GB of 2.9 GB physical memory used; 17.6 GB of 11.7 GB virtual memory used. Killing container.
And
Container [pid=10124,containerID=container_1460478789757_0001_01_000020] is running beyond physical memory limits. Current usage: 5.4 GB of 5 GB physical memory used; 8.4 GB of 20 GB virtual memory used. Killing container.
> and it's always due to some other unrelated external process chewing up RAM.
This should not be the case. The way YARN determines memory usage is by walking down the process tree of the container. We don’t look at memory being used by external processes.
I would recommend increasing the amount of memory allocated for your map tasks until the job finishes(to figure out the upper limit of your map tasks) and going through your map code to see where it’s possible for memory usage to spike.
-Varun

From:  Joseph Naegele <jn...@grierforensics.com>
Date:  Thursday, April 14, 2016 at 5:10 AM
To:  <us...@hadoop.apache.org>
Subject:  YarnChild and Container running beyond physical memory limits

Hi!

 

Can anyone tell me what exactly YarnChild is and how I can control the quantity of child JVMs running in each container? In this case I'm concerned with the map phase of my MR job. I'm having issues with my containers running beyond *physical* memory limits and I'm trying to determine the cause.

 

Is each child JVM just an individual map task? If so, why do I see a variable number of them? I don't know if each of these JVMs is a clone of the original YarnChild process, what they are doing, why they are each using so much memory (1G).

 

Here is a sample excerpt of my MR job when YARN kills a container: https://gist.githubusercontent.com/naegelejd/ad3a58192a2df79775d80e3eac0ae49c/raw/808f998b1987c77ba1fe7fb41abab62ae07c5e02/job.log

Here's the same process tree reorganized and ordered by ancestry: https://gist.githubusercontent.com/naegelejd/37afb27a6cf16ce918daeaeaf7450cdc/raw/b8809ce023840799f2cbbee28e49930671198ead/job.clean.log

 

If I increase the amount of memory per container, in turn lowering the total number of containers, I see these errors less often as expected, BUT when I do see them, there are NO child JVM processes and it's always due to some other unrelated external process chewing up RAM. Here is an example of that: https://gist.githubusercontent.com/naegelejd/32d63b0f9b9c148d1c1c7c0de3c2c317/raw/934a93a7afe09c7cd62a50edc08ce902b9e71aac/job.log. You can see that the [redacted] process is the culprit in that case.

 

I can share my mapred/yarn configuration if it's helpful.

 

If anyone has any ideas I'd greatly appreciate them!

 

Thanks,

Joe