You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Eric Yang (JIRA)" <ji...@apache.org> on 2018/11/28 01:32:00 UTC

[jira] [Updated] (YARN-9067) YARN Resource Manager is running OOM because of leak of Configuration Object

     [ https://issues.apache.org/jira/browse/YARN-9067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Yang updated YARN-9067:
----------------------------
    Description: 
Resource Manager is running out of memory after every 2-3 days in dev cluster,
 After Analyzing the memory dump , it looks like HDFS is leaking configuration object causing YARN RM OOM.
 GC Logs:
{code:java}
PSYoungGen      total 52736K, used 37813K [0x00000000eab00000, 0x00000000eec80000, 0x0000000100000000)
  eden space 39424K, 95% used [0x00000000eab00000,0x00000000ecfed620,0x00000000ed180000)
  from space 13312K, 0% used [0x00000000edf80000,0x00000000edf80000,0x00000000eec80000)
  to   space 13824K, 0% used [0x00000000ed180000,0x00000000ed180000,0x00000000edf00000)
 ParOldGen       total 699392K, used 699329K [0x00000000c0000000, 0x00000000eab00000, 0x00000000eab00000)
  object space 699392K, 99% used [0x00000000c0000000,0x00000000eaaf04a8,0x00000000eab00000)
 Metaspace       used 98178K, capacity 99932K, committed 100440K, reserved 1138688K
  class space    used 10481K, capacity 10829K, committed 10880K, reserved 1048576K
{code}
More than 8K objects of org/apache/Hadoop/Conf and most frequent code path to create Hadoop Configuration object is coming from org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider and all these object are kept in memory, see the attached screenshot for the path to GC root for conf object.

 

  was:
Resource Manager is running out of memory after every 2-3 days in edws dev cluster,
 After Analyzing the memory dump , it looks like HDFS is leaking configuration object causing YARN RM OOM.
 GC Logs:
{code:java}
PSYoungGen      total 52736K, used 37813K [0x00000000eab00000, 0x00000000eec80000, 0x0000000100000000)
  eden space 39424K, 95% used [0x00000000eab00000,0x00000000ecfed620,0x00000000ed180000)
  from space 13312K, 0% used [0x00000000edf80000,0x00000000edf80000,0x00000000eec80000)
  to   space 13824K, 0% used [0x00000000ed180000,0x00000000ed180000,0x00000000edf00000)
 ParOldGen       total 699392K, used 699329K [0x00000000c0000000, 0x00000000eab00000, 0x00000000eab00000)
  object space 699392K, 99% used [0x00000000c0000000,0x00000000eaaf04a8,0x00000000eab00000)
 Metaspace       used 98178K, capacity 99932K, committed 100440K, reserved 1138688K
  class space    used 10481K, capacity 10829K, committed 10880K, reserved 1048576K
{code}
More than 8K objects of org/apache/Hadoop/Conf and most frequent code path to create Hadoop Configuration object is coming from org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider and all these object are kept in memory, see the attached screenshot for the path to GC root for conf object.

 


> YARN Resource Manager is running OOM because of leak of Configuration Object
> ----------------------------------------------------------------------------
>
>                 Key: YARN-9067
>                 URL: https://issues.apache.org/jira/browse/YARN-9067
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>            Priority: Major
>         Attachments: image-2018-11-27-09-55-16-549.png
>
>
> Resource Manager is running out of memory after every 2-3 days in dev cluster,
>  After Analyzing the memory dump , it looks like HDFS is leaking configuration object causing YARN RM OOM.
>  GC Logs:
> {code:java}
> PSYoungGen      total 52736K, used 37813K [0x00000000eab00000, 0x00000000eec80000, 0x0000000100000000)
>   eden space 39424K, 95% used [0x00000000eab00000,0x00000000ecfed620,0x00000000ed180000)
>   from space 13312K, 0% used [0x00000000edf80000,0x00000000edf80000,0x00000000eec80000)
>   to   space 13824K, 0% used [0x00000000ed180000,0x00000000ed180000,0x00000000edf00000)
>  ParOldGen       total 699392K, used 699329K [0x00000000c0000000, 0x00000000eab00000, 0x00000000eab00000)
>   object space 699392K, 99% used [0x00000000c0000000,0x00000000eaaf04a8,0x00000000eab00000)
>  Metaspace       used 98178K, capacity 99932K, committed 100440K, reserved 1138688K
>   class space    used 10481K, capacity 10829K, committed 10880K, reserved 1048576K
> {code}
> More than 8K objects of org/apache/Hadoop/Conf and most frequent code path to create Hadoop Configuration object is coming from org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider and all these object are kept in memory, see the attached screenshot for the path to GC root for conf object.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org