You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "徐鹏 (Jira)" <ji...@apache.org> on 2019/08/21 08:31:00 UTC

[jira] [Created] (YARN-9769) if "ContainerLocalizer Downloader" thread block ,it will never stop

徐鹏 created YARN-9769:
------------------------

             Summary: if "ContainerLocalizer Downloader" thread block ,it will never stop
                 Key: YARN-9769
                 URL: https://issues.apache.org/jira/browse/YARN-9769
             Project: Hadoop YARN
          Issue Type: Improvement
          Components: nodemanager
    Affects Versions: 2.5.0
         Environment: hadoop:2.5.0-cdh5.2.0
            Reporter: 徐鹏
         Attachments: nm_jstack

If "ContainerLocalizer Downloader" thread block ,it will never stop and  nodemanger jvm will run out of memory .NodeManager should fail "ContainerLocalizer Downloader" thread by timeout.
 
In my case:
    *NM jvm main opt*: -
-XX:InitialHeapSize=2147483648 -XX:MaxGCPauseMillis=200 -XX:MaxHeapSize=2147483648 -XX:MaxNewSize=1287651328 -XX:MinHeapDeltaBytes=1048576 - -XX:+UseG1GC
    *gc* : frequently but work bad (old gen >= 99%) 
 
  !image-2019-08-20-23-39-23-968.png!
   *jstack&jmap*: 3602 "ContainerLocalizer Downloader" threads  block  ,total 561MB
 
{code:java}
// code placeholder"ContainerLocalizer Downloader" #59288379 prio=5 os_prio=0 tid=0x00007f9c62d9d800 nid=0xb7550 waiting on condition [0x00007f9b1c2c0000]"ContainerLocalizer Downloader" #59288379 prio=5 os_prio=0 tid=0x00007f9c62d9d800 nid=0xb7550 waiting on condition [0x00007f9b1c2c0000]   java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for  <0x000000008057ddb0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:254) at org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:432) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1016) at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:449) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:783) at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:717) at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:394) at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:305) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:590) - locked <0x00000000fa4ce540> (a org.apache.hadoop.hdfs.DFSInputStream) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:797) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:844) - locked <0x00000000fa4ce540> (a org.apache.hadoop.hdfs.DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:264) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:60) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:356) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:354) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1701) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:354) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:59) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
{code}
!image-2019-08-21-15-18-09-514.png!

!image-2019-08-21-15-18-27-553.png!

 

*ContainerLocalizer.class*

  !image-2019-08-21-16-21-01-610.png!

 

*ADD Loop termination*

   !image-2019-08-21-16-21-23-037.png![^nm_jstack]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org