You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Ajith S (JIRA)" <ji...@apache.org> on 2017/05/24 03:46:04 UTC
[jira] [Commented] (YARN-6637) Deadlock in NativeIO
[ https://issues.apache.org/jira/browse/YARN-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022270#comment-16022270 ]
Ajith S commented on YARN-6637:
-------------------------------
Below are bits from stacktrace:
Thread1
{code}
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:739)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:224)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:208)
{code}
Thread2
{code}
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:160)
at org.apache.hadoop.mapred.ShuffleHandler$Shuffle$1.operationComplete(ShuffleHandler.java:1166)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413)
{code}
The above threads looks to be blocked by below two stacks
Stack1:
{code}
"New I/O worker #1" #135 prio=5 os_prio=0 tid=0x00007f1f60817800 nid=0x697d in Object.wait() [0x00007f1f4429a000]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.<clinit>(NativeIO.java:184)
at org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:160)
at org.apache.hadoop.mapred.ShuffleHandler$Shuffle$1.operationComplete(ShuffleHandler.java:1166)
{code}
Stack2:
{code}
"ContainersLauncher #16" #365 prio=5 os_prio=0 tid=0x00007f1f49c8a800 nid=0x7cd0 in Object.wait() [0x00007f1f32891000]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.io.nativeio.NativeIO.initNative(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO.<clinit>(NativeIO.java:645)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:739)
{code}
*Stack1* is blocked by *Stack2* as *Stack1* thread needs *NativeIO* class initialization to finish, so the problematic stack looks to be Stack 2
Next in Stack 2 @ NativeIO.java:645 initNative in a native call, and it tries to initialize the native-hadoop library
so in Stack 2: it try to do this in NativeIO.c via *Java_org_apache_hadoop_io_nativeio_NativeIO_initNative*
{code}static void consts_init(JNIEnv *env) {
jclass clazz = (*env)->FindClass(env, NATIVE_IO_POSIX_CLASS);
#define NATIVE_IO_POSIX_CLASS "org/apache/hadoop/io/nativeio/NativeIO$POSIX"
i.e create class org.apache.hadoop.io.nativeio.NativeIO$POSIX{code}
but *Stack1* is already in {{org.apache.hadoop.io.nativeio.NativeIO$POSIX.<clint>}}
so it deadlock and all threads hang
> Deadlock in NativeIO
> --------------------
>
> Key: YARN-6637
> URL: https://issues.apache.org/jira/browse/YARN-6637
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Ajith S
> Assignee: Ajith S
> Priority: Critical
>
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org