You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Jagane Sundar (JIRA)" <ji...@apache.org> on 2013/06/20 01:42:20 UTC

[jira] [Commented] (HADOOP-9654) IPC timeout doesn't seem to be kicking in

    [ https://issues.apache.org/jira/browse/HADOOP-9654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688652#comment-13688652 ] 

Jagane Sundar commented on HADOOP-9654:
---------------------------------------

Roman - pardon me if you already know this and are configuring your BigTop test correctly. If you take a look at HDFS-4646 and HDFS-4858, I have observed similar failure to timeout issues with both the HDFS Client to NameNode ipc (HDFS-4646) and the Datanode to NameNode ipc (HDFS-4858).

By default ipc.client.ping is true. The meaning of this is that the IPC layer is to send out a periodic ping but to never timeout.

In order to timeout, ipc.client.ping needs to be configured false and ipc.ping.interval needs to be set to some value e.g. 14000. This configuration means that the IPC Client should timeout in 14000. Is BigTop configuring hadoop so?

                
> IPC timeout doesn't seem to be kicking in
> -----------------------------------------
>
>                 Key: HADOOP-9654
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9654
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.1.0-beta
>            Reporter: Roman Shaposhnik
>
> During my Bigtop testing I made the NN OOM. This, in turn, made all of the clients stuck in the IPC call (even the new clients that I run *after* the NN went OOM). Here's an example of a jstack output on the client that was running:
> {noformat}
> $ hadoop fs -lsr /
> {noformat}
> Stacktrace:
> {noformat}
> /usr/java/jdk1.6.0_21/bin/jstack 19078
> 2013-06-19 23:14:00
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (17.0-b16 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x00007fcd8c8c1800 nid=0x5105 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "IPC Client (1223039541) connection to ip-10-144-82-213.ec2.internal/10.144.82.213:17020 from root" daemon prio=10 tid=0x00007fcd8c7ea000 nid=0x4aa0 runnable [0x00007fcd443e2000]
>    java.lang.Thread.State: RUNNABLE
> 	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> 	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
> 	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> 	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> 	- locked <0x00007fcd7529de18> (a sun.nio.ch.Util$1)
> 	- locked <0x00007fcd7529de00> (a java.util.Collections$UnmodifiableSet)
> 	- locked <0x00007fcd7529da80> (a sun.nio.ch.EPollSelectorImpl)
> 	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> 	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> 	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:116)
> 	at java.io.FilterInputStream.read(FilterInputStream.java:116)
> 	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:421)
> 	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> 	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> 	- locked <0x00007fcd752aaf18> (a java.io.BufferedInputStream)
> 	at java.io.DataInputStream.readInt(DataInputStream.java:370)
> 	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:943)
> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:840)
> "Low Memory Detector" daemon prio=10 tid=0x00007fcd8c090000 nid=0x4a9b runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "CompilerThread1" daemon prio=10 tid=0x00007fcd8c08d800 nid=0x4a9a waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "CompilerThread0" daemon prio=10 tid=0x00007fcd8c08a800 nid=0x4a99 waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=10 tid=0x00007fcd8c088800 nid=0x4a98 runnable [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=10 tid=0x00007fcd8c06a000 nid=0x4a97 in Object.wait() [0x00007fcd902e9000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00007fcd75fc0470> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
> 	- locked <0x00007fcd75fc0470> (a java.lang.ref.ReferenceQueue$Lock)
> 	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
> 	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> "Reference Handler" daemon prio=10 tid=0x00007fcd8c068000 nid=0x4a96 in Object.wait() [0x00007fcd903ea000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00007fcd75fc0550> (a java.lang.ref.Reference$Lock)
> 	at java.lang.Object.wait(Object.java:485)
> 	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
> 	- locked <0x00007fcd75fc0550> (a java.lang.ref.Reference$Lock)
> "main" prio=10 tid=0x00007fcd8c00a800 nid=0x4a92 in Object.wait() [0x00007fcd91b06000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00007fcd752528e8> (a org.apache.hadoop.ipc.Client$Call)
> 	at java.lang.Object.wait(Object.java:485)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1284)
> 	- locked <0x00007fcd752528e8> (a org.apache.hadoop.ipc.Client$Call)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1250)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
> 	at $Proxy9.getFileInfo(Unknown Source)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> 	at $Proxy9.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:649)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1599)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:838)
> 	at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1684)
> 	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1630)
> 	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1605)
> 	at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
> 	at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:224)
> 	at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:207)
> 	at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
> 	at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
> 	at org.apache.hadoop.fs.FsShell.run(FsShell.java:255)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> 	at org.apache.hadoop.fs.FsShell.main(FsShell.java:305)
> "VM Thread" prio=10 tid=0x00007fcd8c064000 nid=0x4a95 runnable 
> "GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fcd8c01d800 nid=0x4a93 runnable 
> "GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fcd8c01f800 nid=0x4a94 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x00007fcd8c09a800 nid=0x4a9c waiting on condition 
> JNI global references: 1086
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira