You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Ajith S (JIRA)" <ji...@apache.org> on 2017/05/24 03:46:04 UTC

[jira] [Commented] (YARN-6637) Deadlock in NativeIO

    [ https://issues.apache.org/jira/browse/YARN-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022270#comment-16022270 ] 

Ajith S commented on YARN-6637:
-------------------------------

Below are bits from stacktrace:

Thread1
{code}
  java.lang.Thread.State: RUNNABLE
	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:739)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:224)
	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:208)
{code}
	
Thread2
{code}
   java.lang.Thread.State: RUNNABLE
	at org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:160)
	at org.apache.hadoop.mapred.ShuffleHandler$Shuffle$1.operationComplete(ShuffleHandler.java:1166)
	at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
	at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413)
{code}

	
The above threads looks to be blocked by below two stacks
	
Stack1:
{code}
"New I/O worker #1" #135 prio=5 os_prio=0 tid=0x00007f1f60817800 nid=0x697d in Object.wait() [0x00007f1f4429a000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.hadoop.io.nativeio.NativeIO$POSIX.<clinit>(NativeIO.java:184)
	at org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:160)
	at org.apache.hadoop.mapred.ShuffleHandler$Shuffle$1.operationComplete(ShuffleHandler.java:1166)
{code}
	
Stack2:
{code}
"ContainersLauncher #16" #365 prio=5 os_prio=0 tid=0x00007f1f49c8a800 nid=0x7cd0 in Object.wait() [0x00007f1f32891000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.hadoop.io.nativeio.NativeIO.initNative(Native Method)
	at org.apache.hadoop.io.nativeio.NativeIO.<clinit>(NativeIO.java:645)
	at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:739)
{code}
	
*Stack1* is blocked by *Stack2* as *Stack1* thread needs *NativeIO* class initialization to finish, so the problematic stack looks to be Stack 2	
Next in Stack 2 @  NativeIO.java:645 initNative in a native call, and it tries to initialize the native-hadoop library

so in Stack 2: it try to do this in NativeIO.c via *Java_org_apache_hadoop_io_nativeio_NativeIO_initNative*
{code}static void consts_init(JNIEnv *env) {
  jclass clazz = (*env)->FindClass(env, NATIVE_IO_POSIX_CLASS);
  #define NATIVE_IO_POSIX_CLASS "org/apache/hadoop/io/nativeio/NativeIO$POSIX"
   i.e create class org.apache.hadoop.io.nativeio.NativeIO$POSIX{code}
   
but *Stack1* is already in {{org.apache.hadoop.io.nativeio.NativeIO$POSIX.<clint>}}
so it deadlock and all threads hang

> Deadlock in NativeIO
> --------------------
>
>                 Key: YARN-6637
>                 URL: https://issues.apache.org/jira/browse/YARN-6637
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ajith S
>            Assignee: Ajith S
>            Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org