You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Billie Rinaldi (JIRA)" <ji...@apache.org> on 2017/11/03 16:40:00 UTC

[jira] [Resolved] (YARN-7426) Interrupt does not work when LocalizerRunner is reading from InputStream

     [ https://issues.apache.org/jira/browse/YARN-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Billie Rinaldi resolved YARN-7426.
----------------------------------
    Resolution: Duplicate

> Interrupt does not work when LocalizerRunner is reading from InputStream
> ------------------------------------------------------------------------
>
>                 Key: YARN-7426
>                 URL: https://issues.apache.org/jira/browse/YARN-7426
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Priority: Critical
>
> When the NodeManager is overloaded and ContainerLocalizer processes are hanging, the containers will timeout and cleaned up. The LocalizerRunner thread will be interrupted during cleanup but the interrupt does not work when it is reading from FileInputStream. LocalizerRunner threads and ContainerLocalizer process keeps on accumulating which makes the node completely unresponsive. We can have a timeout for Shell Command to avoid this similar to HADOOP-13817.
> The timeout value can be set by AM same as container timeout.
> ContainerLocalizer JVM stacktrace:
> {code}
> "main" #1 prio=5 os_prio=0 tid=0x00007fd8ec019000 nid=0xc295 runnable [0x00007fd8f3956000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.zip.ZipFile.open(Native Method)
> 	at java.util.zip.ZipFile.<init>(ZipFile.java:219)
> 	at java.util.zip.ZipFile.<init>(ZipFile.java:149)
> 	at java.util.jar.JarFile.<init>(JarFile.java:166)
> 	at java.util.jar.JarFile.<init>(JarFile.java:103)
> 	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:893)
> 	at sun.misc.URLClassPath$JarLoader.access$700(URLClassPath.java:756)
> 	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:838)
> 	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:831)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:830)
> 	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:803)
> 	at sun.misc.URLClassPath$3.run(URLClassPath.java:530)
> 	at sun.misc.URLClassPath$3.run(URLClassPath.java:520)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
> 	at sun.misc.URLClassPath.getLoader(URLClassPath.java:492)
> 	- locked <0x000000076ac75058> (a sun.misc.URLClassPath)
> 	at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:457)
> 	- locked <0x000000076ac75058> (a sun.misc.URLClassPath)
> 	at sun.misc.URLClassPath.getResource(URLClassPath.java:211)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> 	- locked <0x000000076ac7f960> (a java.lang.Object)
> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> 	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
> {code}
> NodeManager LocalizerRunner thread which is not interrupted:
> {code}
> "LocalizerRunner for container_e746_1508665985104_601806_01_000005" #3932753 prio=5 os_prio=0 tid=0x00007fb258d5f800 nid=0x11091 runnable [0x00007fb153946000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.FileInputStream.readBytes(Native Method)
>         at java.io.FileInputStream.read(FileInputStream.java:255)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>         - locked <0x0000000718502b80> (a java.lang.UNIXProcess$ProcessPipeInputStream)
>         at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
>         at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
>         at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
>         - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
>         at java.io.InputStreamReader.read(InputStreamReader.java:184)
>         at java.io.BufferedReader.fill(BufferedReader.java:161)
>         at java.io.BufferedReader.read1(BufferedReader.java:212)
>         at java.io.BufferedReader.read(BufferedReader.java:286)
>         - locked <0x0000000718502bd8> (a java.io.InputStreamReader)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1155)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:930)
>         at org.apache.hadoop.util.Shell.run(Shell.java:848)
>         at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor.executePrivilegedOperation(PrivilegedOperationExecutor.java:151)
>         at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:264)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
> NM log shows the LocalizerRunner is suppose to 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org