You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "raanan (JIRA)" <ji...@apache.org> on 2014/06/23 14:13:24 UTC

[jira] [Commented] (HADOOP-10412) First call from Client fails after Server restart

    [ https://issues.apache.org/jira/browse/HADOOP-10412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040669#comment-14040669 ] 

raanan commented on HADOOP-10412:
---------------------------------

Happened for me also when HDFS Client try to get block location from NN after NN restart

OS: Red Hat Enterprise Linux Server release 6.4 (Santiago)
Java: 1.7.0_15

2014-06-23 03:35:40.0532 DEBUG IPC Client (1936601988) connection to nnhost:8020 from org.apache.hadoop.ipc.Client - closing ipc connection to nnhost:8020:
 null
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)

2014-06-23 03:35:40.0536 ERROR ...  Exception thrown when streaming
java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "dnhost"; destination host is:  nnhost":8020; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1351)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy56.getBlockLocations(Unknown Source)
        ....



> First call from Client fails after Server restart
> -------------------------------------------------
>
>                 Key: HADOOP-10412
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10412
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.2.0
>         Environment: Linux : centos62-2 2.6.32-220.el6.x86_64,
> jdk : 1.7.0_15
>            Reporter: Arun Suresh
>
> This seems to happen only for ProtobufRpc based services. Could not reproduce using simple WritableRpc.
> Steps to reproduce :
> Consider the case of namenode HA failover. nn1 and nn2 are both namenodes, nn1 is 'active' and nn2 is 'standby'
> 1) Bring down nn1 process. Now nn2 is active
> 2) Bring nn1 process back up. Now nn1 is standby and nn2 is active.
> 3) Manually issue failover using command :
> {quote}
> $ hdfs haadmin -failover nn2 nn1
> {quote}
> It is observed that the first call always fails with the Following exception :
> {quote}
> Operation failed: Failed to become active. Couldn't make NameNode at centos62-2/192.168.2.202:8020 active
> java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "centos62-2/192.168.2.202"; destination host is: "centos62-2":8020;
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1351)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> 	at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source)
> 	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClientSideTranslatorPB.java:100)
> 	at org.apache.hadoop.ha.HAServiceProtocolHelper.transitionToActive(HAServiceProtocolHelper.java:48)
> 	at org.apache.hadoop.ha.ZKFailoverController.becomeActive(ZKFailoverController.java:373)
> 	at org.apache.hadoop.ha.ZKFailoverController.access$900(ZKFailoverController.java:59)
> 	at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.becomeActive(ZKFailoverController.java:818)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:803)
> 	at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> Caused by: java.io.EOFException
> 	at java.io.DataInputStream.readInt(DataInputStream.java:392)
> 	at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:995)
> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:891)
> 	at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:673)
> 	at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:59)
> 	at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:592)
> 	at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:589)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> 	at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:589)
> 	at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94)
> 	at org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61)
> 	at org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1548)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
> {quote}
> The calls succeeds if I issue the same command subsequently



--
This message was sent by Atlassian JIRA
(v6.2#6252)