You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Uma Maheswara Rao G (Commented) (JIRA)" <ji...@apache.org> on 2012/02/01 05:46:59 UTC

[jira] [Commented] (HADOOP-7047) RPC client gets stuck

    [ https://issues.apache.org/jira/browse/HADOOP-7047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197577#comment-13197577 ] 

Uma Maheswara Rao G commented on HADOOP-7047:
---------------------------------------------

In one of my cluster i faced similar situation.
Clinet got OOME in Datastreamer thread, went for processDataNodeError. Here while creating datanode proxy connection, it got hanged.

here is the dump, attched as well.
{code}
"DataStreamer for file /ngcdn/report/file/toptraffic/20120120-102619003-91.log.tmp block blk_1326295273061_564234" daemon prio=10 tid=0xfec4e000 nid=0x38d0 in Object.wait() [0xffff1000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:485)
	at org.apache.hadoop.ipc.Client.call(Client.java:940)
	- locked <0xb0a9d1e0> (a org.apache.hadoop.ipc.Client$Call)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245)
	at $Proxy6.getProtocolVersion(Unknown Source)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:389)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:376)
	at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:413)
	at org.apache.hadoop.hdfs.DFSClient.createClientDatanodeProtocolProxy(DFSClient.java:282)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3397)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2809)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3024)
	- locked <0xc55ad1e8> (a java.util.LinkedList)
{code}

I am in 20.2 version.
We already merged the fix what Hairong pointed here.
{code}
    try {
      while (waitForWork()) {// wait here for work - read or close
      // connection
      receiveResponse();
      }
    } catch (Throwable t) {
      // This truly is unexpected, since we catch IOException in receiveResponse
      // -- this is only to be really sure that we don't leave a client hanging
      // forever.
      LOG.warn("Unexpected error reading responses on connection " + this, t);
      markClosed(new IOException("Error reading responses", t));
    }
{code}

Looking at this, it should mark the connections closed and notify the waiting connections.

This did not happen. Some how this thread got exited. We can find this from attached dump. Only namenode IPC CLient thread is there. Can't see DataNode IPC Client thread.
Unportunately i ran with info logs and also not enabled console logs. I did not see any OOME from IPC Clinet therads in info logs.
If this thread silently exited with some exception, then it would have logged in console.

Only possible thing I see here is, throw OOME again from Throwable?

                
> RPC client gets stuck
> ---------------------
>
>                 Key: HADOOP-7047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7047
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
>
>         Attachments: jstack.log, trunkStuckClient.patch
>
>
> One of the dfs clients in our cluster stuck on waiting for a RPC result. However the IPC connection thread who is receiving the RPC result died on OOM error:
> INFO >> Exception in thread "IPC Client (47) connection to XX from root" java.lang.OutOfMemoryError: Java heap space
> INFO >> at java.util.Arrays.copyOfRange(Arrays.java:3209)
> INFO >> at java.lang.String.<init>(String.java:216)
> INFO >> at java.lang.StringBuffer.toString(StringBuffer.java:585)
> INFO >> at java.net.URI.toString(URI.java:1907)
> INFO >> at java.net.URI.<init>(URI.java:732)
> INFO >> at org.apache.hadoop.fs.Path.initialize(Path.java:137)
> INFO >> at org.apache.hadoop.fs.Path.<init>(Path.java:126)
> INFO >> at org.apache.hadoop.fs.FileStatus.readFields(FileStatus.java:206)
> INFO >> at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
> INFO >> at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:171)
> INFO >> at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:219)
> INFO >> at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
> INFO >> at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:531)
> INFO >> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:466)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira