You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Shurong Mai (JIRA)" <ji...@apache.org> on 2019/04/28 10:04:00 UTC

[jira] [Commented] (YARN-5449) nodemanager process is hung, and lost from resourcemanager

    [ https://issues.apache.org/jira/browse/YARN-5449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827893#comment-16827893 ] 

Shurong Mai commented on YARN-5449:
-----------------------------------

[~rohithsharma] , thank you for your attention and advices . Before I created this issue, we had been making analysis it for a long time from  jvm process thread stack, jvm process  heap memory, different java version, os log, different os version,  different os file system and so on. But we can't get the reason for sure. As a result of we analysed, the most  probable reason is that nodemanager process is hung.

> nodemanager process is hung, and lost from resourcemanager
> ----------------------------------------------------------
>
>                 Key: YARN-5449
>                 URL: https://issues.apache.org/jira/browse/YARN-5449
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.2.0
>         Environment: The os version is 2.6.32-573.8.1.el6.x86_64 GNU/Linux
> The java version is jdk1.7.0_45
> The hadoop version is hadoop-2.2.0
>            Reporter: Shurong Mai
>            Priority: Major
>
> The nodemanager process is hung(is not dead), and lost from resourcemanager.
> The nodemanager's log is stopped from printing.
> The used cpu of nodemanager process is very low(nearly 0%).
> GC of nodemanager jvm process is stopped, and the result of jstat(jstat -gccause pid 1000 100) is as follows:
>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT    LGCC                 GCC                 
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No GC                G1 Evacuation Pause
>   0.00 100.00  95.06  24.08  30.46   3274  623.437     7    5.899  629.335 No GC                G1 Evacuation Pause
> The nodemanager jvm process is also accur this problem using CMS garbage collector or g1 garbage collector.
> The parameters of CMS garbage collector are as following:
> -Xmx4096m  -Xmn1024m  -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled -XX:ConcGCThreads=4 -XX:+UseCMSCom pactAtFullCollection -XX:CMSFullGCsBeforeCompaction=8 -XX:ParallelGCThreads=4 -XX:CMSInitiatingOccupancyFraction=70 
> The parameters of g1 garbage collector are as following:
> -Xmx8g -Xms8g -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseG1GC  -XX:MaxGCPauseMillis=1000 -XX:G1ReservePercent=30 -XX:InitiatingHeapOccupancyPercent=45 -XX:ConcGCThreads=4  -XX:+PrintAdaptiveSizePolicy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org