You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/11/17 23:57:44 UTC

[jira] Commented: (HADOOP-4672) RPC on Datanode blocked forever.

    [ https://issues.apache.org/jira/browse/HADOOP-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648399#action_12648399 ] 

Raghu Angadi commented on HADOOP-4672:
--------------------------------------

Also, interrupting the thread didn't help. {{epoll_wait()}} returns but the thread calls {{epoll_wait()}} again.

Datanode stacktrace :

{noformat}
"DataNode:[]" daemon prio=10 tid=0x083b1800 nid=0x5a67 runnable [0xb110b000..0xb110c120]
   java.lang.Thread.State: RUNNABLE
  at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
  at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
  at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
  at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
  - locked <0xf3944168> (a sun.nio.ch.Util$1)
  - locked <0xf3944158> (a java.util.Collections$UnmodifiableSet)
  - locked <0xf3944178> (a sun.nio.ch.EPollSelectorImpl)
  at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
  at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:237)
  at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
  at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
  at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
  at org.apache.hadoop.ipc.Client$Connection$2.write(Client.java:214)
  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
  - locked <0xb6dc7d50> (a java.io.BufferedOutputStream)
  at java.io.DataOutputStream.write(DataOutputStream.java:90)
  - locked <0xb6dc7d38> (a java.io.DataOutputStream)
  at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:357)
  - locked <0xb6dc7d38> (a java.io.DataOutputStream)
  at org.apache.hadoop.ipc.Client.call(Client.java:549)
  - locked <0xf3944228> (a org.apache.hadoop.ipc.Client$Call)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
  at org.apache.hadoop.dfs.$Proxy4.blockReport(Unknown Source)
  at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:670)
  at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2696)
  at java.lang.Thread.run(Thread.java:619)
{noformat}

> RPC on Datanode blocked forever.
> --------------------------------
>
>                 Key: HADOOP-4672
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4672
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs, io
>    Affects Versions: 0.17.0
>         Environment: Java SE 1.6.0-b105 on Linux 2.6.x
>            Reporter: Raghu Angadi
>
> We recently noticed a number of datanodes got stuck. The main thread that sends heartbeats and block reports is blocked in select() in side blockReport() RPC.  I will add a stack trace in the next comment.
> I am not sure why select was blocked forever since there is no connection open to NameNode. In fact, NN was restarted in between. It could be some JDK bug or a Hadoop bug.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.