You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ajith S (JIRA)" <ji...@apache.org> on 2015/08/11 09:00:54 UTC

[jira] [Resolved] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

     [ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ajith S resolved HADOOP-9654.
-----------------------------
    Resolution: Duplicate

Closing the issues, if anyone disagree, please feel free to reopen

> IPC timeout doesn't seem to be kicking in
> -----------------------------------------
>
>                 Key: HADOOP-9654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9654
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.1.0-beta
>            Reporter: Roman Shaposhnik
>            Assignee: Ajith S
>
> During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running:
> {noformat}
> $ hadoop fs -lsr /
> {noformat}
> Stacktrace:
> {noformat}
> /usr/java/jdk1.6.0_21/bin/jstack 19078
> 2013-06-19 23:14:00
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x00007fcd8c8c1800 nid=0x5105 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root" daemon prio=10 tid=0x00007fcd8c7ea000 nid=0x4aa0 runnable [0x00007fcd443e2000]
>    java.lang.Thread.State: RUNNABLE
> 	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> 	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
> 	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> 	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> 	- locked <0x00007fcd7529de18> (a sun.nio.ch.Util$1)
> 	- locked <0x00007fcd7529de00> (a java.util.Collections$UnmodifiableSet)
> 	- locked <0x00007fcd7529da80> (a sun.nio.ch.EPollSelectorImpl)
> 	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> 	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:116)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:116)
> 	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 	- locked <0x00007fcd752aaf18> (a java.io.BufferedInputStream)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:370)
> 	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
> "Low Memory Detector" daemon prio=10 tid=0x00007fcd8c090000 nid=0x4a9b runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "CompilerThread1" daemon prio=10 tid=0x00007fcd8c08d800 nid=0x4a9a waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "CompilerThread0" daemon prio=10 tid=0x00007fcd8c08a800 nid=0x4a99 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=10 tid=0x00007fcd8c088800 nid=0x4a98 runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=10 tid=0x00007fcd8c06a000 nid=0x4a97 in Object.wait() [0x00007fcd902e9000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00007fcd75fc0470> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
> 	- locked <0x00007fcd75fc0470> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> "Reference Handler" daemon prio=10 tid=0x00007fcd8c068000 nid=0x4a96 in Object.wait() [0x00007fcd903ea000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00007fcd75fc0550> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:485)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> 	- locked <0x00007fcd75fc0550> (a java.lang.ref.Reference$Lock)
> "main" prio=10 tid=0x00007fcd8c00a800 nid=0x4a92 in Object.wait() [0x00007fcd91b06000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00007fcd752528e8> (a org.apache.hadoop.ipc.Client$Call)
> 	at java.lang.Object.wait(Object.java:485)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1284)
> 	- locked <0x00007fcd752528e8> (a org.apache.hadoop.ipc.Client$Call)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1250)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
> 	at $Proxy9.getFileInfo(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> 	at $Proxy9.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:649)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1599)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:838)
> 	at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1684)
> 	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1630)
> 	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1605)
> 	at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
> 	at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:224)
> 	at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:207)
> 	at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
> 	at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
> 	at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> 	at org.apache.hadoop.fs.FsShell.main(FsShell.java:305)
> "VM Thread" prio=10 tid=0x00007fcd8c064000 nid=0x4a95 runnable 
> "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fcd8c01d800 nid=0x4a93 runnable 
> "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fcd8c01f800 nid=0x4a94 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x00007fcd8c09a800 nid=0x4a9c waiting on condition 
> JNI global references: 1086
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)