You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Manikandan R (JIRA)" <ji...@apache.org> on 2016/07/13 07:08:20 UTC

[jira] [Created] (YARN-5370) Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM

Manikandan R created YARN-5370:
----------------------------------

             Summary: Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM
                 Key: YARN-5370
                 URL: https://issues.apache.org/jira/browse/YARN-5370
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Manikandan R


I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev  cluster for some reasons. It has been done before 3-4 weeks. After setting this up, at times, NM crashes because of OOM. So, I kept on increasing from 512MB to 6 GB over the past few weeks gradually as and when this crash occurs as temp fix. Sometimes, It won't start smoothly and after multiple tries, it starts functioning. While analyzing heap dump of corresponding JVM, come to know that DeletionService.Java is occupying almost 99% of total allocated memory (-xmx) something like this

org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13%

Basically, there are huge no. of above mentioned tasks scheduled for deletion. Usually, I see NM memory requirements as 2-4GB for large clusters. In my case, cluster is very small and OOM occurs.

Is it expected behaviour? (or) Is there any limit we can expose on yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org