You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by YouPeng Yang <yy...@gmail.com> on 2013/12/02 13:04:17 UTC

auto-failover does not work

Hi i
  I'm testing the HA auto-failover within hadoop-2.2.0

  The cluster can be manully failover ,however failed with the automatic
failover.
I setup the HA according to  the URL

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html

  When I test the automatic failover, I killed my active NN by kill -9
<Pid-nn>,while the standby namenode does not change to active state.
  It came out the log in my DFSZKFailoverController as [1]

 Please help me ,any suggestion will be appreciated.


Regards.


zkfc
log[1]----------------------------------------------------------------------------------------------------

2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
Beginning Service Fencing Process... ======
2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying method
1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
Connecting to hadoop3...
2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Connecting to hadoop3 port 22
2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Connection established
2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Remote version string: SSH-2.0-OpenSSH_5.3
2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Local version string: SSH-2.0-JSCH-0.1.42
2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
CheckCiphers:
aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
aes256-ctr is not available.
2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
aes192-ctr is not available.
2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
aes256-cbc is not available.
2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
aes192-cbc is not available.
2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
arcfour256 is not available.
2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
SSH_MSG_KEXINIT sent
2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
SSH_MSG_KEXINIT received
2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
kex: server->client aes128-ctr hmac-md5 none
2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
kex: client->server aes128-ctr hmac-md5 none
2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
SSH_MSG_KEXDH_INIT sent
2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
expecting SSH_MSG_KEXDH_REPLY
2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
ssh_rsa_verify: signature true
2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Permanently added 'hadoop3' (RSA) to the list of known hosts.
2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
SSH_MSG_NEWKEYS sent
2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
SSH_MSG_NEWKEYS received
2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
SSH_MSG_SERVICE_REQUEST sent
2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
SSH_MSG_SERVICE_ACCEPT received
2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Authentications that can continue:
gssapi-with-mic,publickey,keyboard-interactive,password
2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Next authentication method: gssapi-with-mic
2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Authentications that can continue: publickey,keyboard-interactive,password
2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Next authentication method: publickey
2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
Disconnecting from hadoop3 port 22
2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort: Unable
to connect to hadoop3 as user hadoop
com.jcraft.jsch.JSchException: Auth fail
    at com.jcraft.jsch.Session.connect(Session.java:452)
    at
org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
    at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
    at
org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
    at
org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
    at
org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
    at
org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
    at
org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
fence service by any configured method.
2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
SERVICE_NOT_RESPONDING
2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Exception handling the winning of election
java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
10.7.23.124:8020
    at
org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
    at
org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
    at
org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
    at
org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
    at
org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
    at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
    at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
    at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Trying to re-establish ZK session
2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
0x2429313c808025b closed
2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
sessionTimeout=5000
watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
authenticate using SASL (Unable to locate a login configuration)
2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hadoop3/10.7.23.124:2181, initiating session
2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
0x3429312ba330262, negotiated timeout = 5000
2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session connected.
2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
marking that fencing is necessary
2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Yielding from election
2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
0x3429312ba330262 closed
2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Ignoring stale result from old client with sessionId 0x3429312ba330262
2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
   Thanks for your reply. It works.
   Formerly, I setup the ssh with a passwd,and before start-dfs.sh or
stop-dfs.sh ,it needs to enter password once by enter  ssh-agent bash and
ssh-add.
   Now I recreate the rsa without a passwd.Finnaly it work -HA does the
automatic-failover..

   But  I do think  it is a safe way with a password when i create  the rsa.
   Can I acheive the HA automatic-failover also with a ssh setting
including a passwd?


Regards


2013/12/2 Jitendra Yadav <je...@gmail.com>

> If you are using hadoop user and you have correct ssh conf  then below
> commands
> should works without password.
>
> Execute from NN2 & NN1
> # ssh hadoop@NN1_host
>
> &
>
> Execute from NN2 & NN1
> # ssh hadoop@NN2_host
>
> Regards
> Jitendra
>
>
>
> On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Jitendra
>>   Yes
>>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
>> ssh the NN from each other.Is it an problem?
>>
>> Regards
>>
>>
>>
>>
>> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>>
>>> Are you able to connect both NN hosts using SSH without password?
>>> Make sure you have correct ssh keys in authorized key file.
>>>
>>> Regards
>>> Jitendra
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi Pavan
>>>>
>>>>
>>>>   I'm using sshfence
>>>>
>>>> ------core-site.xml-----------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>fs.defaultFS</name>
>>>>      <value>hdfs://lklcluster</value>
>>>>      <final>true</final>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>hadoop.tmp.dir</name>
>>>>      <value>/home/hadoop/tmp2</value>
>>>>  </property>
>>>>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> -------hdfs-site.xml-------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>dfs.namenode.name.dir</name>
>>>>     <value>/home/hadoop/namedir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>dfs.datanode.data.dir</name>
>>>>      <value>/home/hadoop/datadir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>    <name>dfs.nameservices</name>
>>>>    <value>lklcluster</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>>     <value>nn1,nn2</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>>   <value>hadoop2:8020</value>
>>>> </property>
>>>> <property>
>>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:8020</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>>     <value>hadoop2:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>>
>>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>>
>>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.ha.fencing.methods</name>
>>>>   <value>sshfence</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>>      <value>5000</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.journalnode.edits.dir</name>
>>>>    <value>/home/hadoop/journal/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>>       <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>>      <name>ha.zookeeper.quorum</name>
>>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>>
>>>>> Post your config files and in which method you are following for
>>>>> automatic failover
>>>>>
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <
>>>>> yypvsxf19870706@gmail.com> wrote:
>>>>>
>>>>>> Hi i
>>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>>
>>>>>>   The cluster can be manully failover ,however failed with the
>>>>>> automatic failover.
>>>>>> I setup the HA according to  the URL
>>>>>>
>>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>>
>>>>>>   When I test the automatic failover, I killed my active NN by kill
>>>>>> -9 <Pid-nn>,while the standby namenode does not change to active state.
>>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>>
>>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>>
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> zkfc
>>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>>> Beginning Service Fencing Process... ======
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Connecting to hadoop3...
>>>>>> 2013-12-02 19:49:28,590 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>>> 2013-12-02 19:49:28,592 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>>> SSH-2.0-OpenSSH_5.3
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>>> SSH-2.0-JSCH-0.1.42
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>>> 2013-12-02 19:49:28,609 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>>> 2013-12-02 19:49:28,634 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>>> 2013-12-02 19:49:28,635 WARN
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>>> (RSA) to the list of known hosts.
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>>> 2013-12-02 19:49:28,636 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>>> 2013-12-02 19:49:28,637 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>>> 2013-12-02 19:49:28,638 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,639 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> gssapi-with-mic
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> publickey
>>>>>> 2013-12-02 19:49:28,644 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>>> port 22
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Unable to connect to hadoop3 as user hadoop
>>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>>> to fence service by any configured method.
>>>>>> 2013-12-02 19:49:28,645 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>>> 2013-12-02 19:49:28,646 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>>> of election
>>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>>> 10.7.23.124:8020
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,646 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x2429313c808025b closed
>>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>>> Initiating client connection,
>>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>>> connection established to hadoop3/10.7.23.124:2181, initiating
>>>>>> session
>>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>>> necessary
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x3429312ba330262 closed
>>>>>> 2013-12-02 19:49:29,728 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>>> client with sessionId 0x3429312ba330262
>>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
   Thanks for your reply. It works.
   Formerly, I setup the ssh with a passwd,and before start-dfs.sh or
stop-dfs.sh ,it needs to enter password once by enter  ssh-agent bash and
ssh-add.
   Now I recreate the rsa without a passwd.Finnaly it work -HA does the
automatic-failover..

   But  I do think  it is a safe way with a password when i create  the rsa.
   Can I acheive the HA automatic-failover also with a ssh setting
including a passwd?


Regards


2013/12/2 Jitendra Yadav <je...@gmail.com>

> If you are using hadoop user and you have correct ssh conf  then below
> commands
> should works without password.
>
> Execute from NN2 & NN1
> # ssh hadoop@NN1_host
>
> &
>
> Execute from NN2 & NN1
> # ssh hadoop@NN2_host
>
> Regards
> Jitendra
>
>
>
> On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Jitendra
>>   Yes
>>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
>> ssh the NN from each other.Is it an problem?
>>
>> Regards
>>
>>
>>
>>
>> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>>
>>> Are you able to connect both NN hosts using SSH without password?
>>> Make sure you have correct ssh keys in authorized key file.
>>>
>>> Regards
>>> Jitendra
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi Pavan
>>>>
>>>>
>>>>   I'm using sshfence
>>>>
>>>> ------core-site.xml-----------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>fs.defaultFS</name>
>>>>      <value>hdfs://lklcluster</value>
>>>>      <final>true</final>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>hadoop.tmp.dir</name>
>>>>      <value>/home/hadoop/tmp2</value>
>>>>  </property>
>>>>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> -------hdfs-site.xml-------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>dfs.namenode.name.dir</name>
>>>>     <value>/home/hadoop/namedir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>dfs.datanode.data.dir</name>
>>>>      <value>/home/hadoop/datadir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>    <name>dfs.nameservices</name>
>>>>    <value>lklcluster</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>>     <value>nn1,nn2</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>>   <value>hadoop2:8020</value>
>>>> </property>
>>>> <property>
>>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:8020</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>>     <value>hadoop2:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>>
>>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>>
>>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.ha.fencing.methods</name>
>>>>   <value>sshfence</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>>      <value>5000</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.journalnode.edits.dir</name>
>>>>    <value>/home/hadoop/journal/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>>       <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>>      <name>ha.zookeeper.quorum</name>
>>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>>
>>>>> Post your config files and in which method you are following for
>>>>> automatic failover
>>>>>
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <
>>>>> yypvsxf19870706@gmail.com> wrote:
>>>>>
>>>>>> Hi i
>>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>>
>>>>>>   The cluster can be manully failover ,however failed with the
>>>>>> automatic failover.
>>>>>> I setup the HA according to  the URL
>>>>>>
>>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>>
>>>>>>   When I test the automatic failover, I killed my active NN by kill
>>>>>> -9 <Pid-nn>,while the standby namenode does not change to active state.
>>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>>
>>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>>
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> zkfc
>>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>>> Beginning Service Fencing Process... ======
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Connecting to hadoop3...
>>>>>> 2013-12-02 19:49:28,590 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>>> 2013-12-02 19:49:28,592 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>>> SSH-2.0-OpenSSH_5.3
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>>> SSH-2.0-JSCH-0.1.42
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>>> 2013-12-02 19:49:28,609 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>>> 2013-12-02 19:49:28,634 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>>> 2013-12-02 19:49:28,635 WARN
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>>> (RSA) to the list of known hosts.
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>>> 2013-12-02 19:49:28,636 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>>> 2013-12-02 19:49:28,637 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>>> 2013-12-02 19:49:28,638 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,639 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> gssapi-with-mic
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> publickey
>>>>>> 2013-12-02 19:49:28,644 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>>> port 22
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Unable to connect to hadoop3 as user hadoop
>>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>>> to fence service by any configured method.
>>>>>> 2013-12-02 19:49:28,645 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>>> 2013-12-02 19:49:28,646 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>>> of election
>>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>>> 10.7.23.124:8020
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,646 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x2429313c808025b closed
>>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>>> Initiating client connection,
>>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>>> connection established to hadoop3/10.7.23.124:2181, initiating
>>>>>> session
>>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>>> necessary
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x3429312ba330262 closed
>>>>>> 2013-12-02 19:49:29,728 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>>> client with sessionId 0x3429312ba330262
>>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
   Thanks for your reply. It works.
   Formerly, I setup the ssh with a passwd,and before start-dfs.sh or
stop-dfs.sh ,it needs to enter password once by enter  ssh-agent bash and
ssh-add.
   Now I recreate the rsa without a passwd.Finnaly it work -HA does the
automatic-failover..

   But  I do think  it is a safe way with a password when i create  the rsa.
   Can I acheive the HA automatic-failover also with a ssh setting
including a passwd?


Regards


2013/12/2 Jitendra Yadav <je...@gmail.com>

> If you are using hadoop user and you have correct ssh conf  then below
> commands
> should works without password.
>
> Execute from NN2 & NN1
> # ssh hadoop@NN1_host
>
> &
>
> Execute from NN2 & NN1
> # ssh hadoop@NN2_host
>
> Regards
> Jitendra
>
>
>
> On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Jitendra
>>   Yes
>>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
>> ssh the NN from each other.Is it an problem?
>>
>> Regards
>>
>>
>>
>>
>> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>>
>>> Are you able to connect both NN hosts using SSH without password?
>>> Make sure you have correct ssh keys in authorized key file.
>>>
>>> Regards
>>> Jitendra
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi Pavan
>>>>
>>>>
>>>>   I'm using sshfence
>>>>
>>>> ------core-site.xml-----------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>fs.defaultFS</name>
>>>>      <value>hdfs://lklcluster</value>
>>>>      <final>true</final>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>hadoop.tmp.dir</name>
>>>>      <value>/home/hadoop/tmp2</value>
>>>>  </property>
>>>>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> -------hdfs-site.xml-------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>dfs.namenode.name.dir</name>
>>>>     <value>/home/hadoop/namedir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>dfs.datanode.data.dir</name>
>>>>      <value>/home/hadoop/datadir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>    <name>dfs.nameservices</name>
>>>>    <value>lklcluster</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>>     <value>nn1,nn2</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>>   <value>hadoop2:8020</value>
>>>> </property>
>>>> <property>
>>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:8020</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>>     <value>hadoop2:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>>
>>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>>
>>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.ha.fencing.methods</name>
>>>>   <value>sshfence</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>>      <value>5000</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.journalnode.edits.dir</name>
>>>>    <value>/home/hadoop/journal/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>>       <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>>      <name>ha.zookeeper.quorum</name>
>>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>>
>>>>> Post your config files and in which method you are following for
>>>>> automatic failover
>>>>>
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <
>>>>> yypvsxf19870706@gmail.com> wrote:
>>>>>
>>>>>> Hi i
>>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>>
>>>>>>   The cluster can be manully failover ,however failed with the
>>>>>> automatic failover.
>>>>>> I setup the HA according to  the URL
>>>>>>
>>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>>
>>>>>>   When I test the automatic failover, I killed my active NN by kill
>>>>>> -9 <Pid-nn>,while the standby namenode does not change to active state.
>>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>>
>>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>>
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> zkfc
>>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>>> Beginning Service Fencing Process... ======
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Connecting to hadoop3...
>>>>>> 2013-12-02 19:49:28,590 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>>> 2013-12-02 19:49:28,592 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>>> SSH-2.0-OpenSSH_5.3
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>>> SSH-2.0-JSCH-0.1.42
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>>> 2013-12-02 19:49:28,609 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>>> 2013-12-02 19:49:28,634 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>>> 2013-12-02 19:49:28,635 WARN
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>>> (RSA) to the list of known hosts.
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>>> 2013-12-02 19:49:28,636 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>>> 2013-12-02 19:49:28,637 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>>> 2013-12-02 19:49:28,638 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,639 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> gssapi-with-mic
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> publickey
>>>>>> 2013-12-02 19:49:28,644 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>>> port 22
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Unable to connect to hadoop3 as user hadoop
>>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>>> to fence service by any configured method.
>>>>>> 2013-12-02 19:49:28,645 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>>> 2013-12-02 19:49:28,646 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>>> of election
>>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>>> 10.7.23.124:8020
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,646 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x2429313c808025b closed
>>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>>> Initiating client connection,
>>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>>> connection established to hadoop3/10.7.23.124:2181, initiating
>>>>>> session
>>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>>> necessary
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x3429312ba330262 closed
>>>>>> 2013-12-02 19:49:29,728 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>>> client with sessionId 0x3429312ba330262
>>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi
   Thanks for your reply. It works.
   Formerly, I setup the ssh with a passwd,and before start-dfs.sh or
stop-dfs.sh ,it needs to enter password once by enter  ssh-agent bash and
ssh-add.
   Now I recreate the rsa without a passwd.Finnaly it work -HA does the
automatic-failover..

   But  I do think  it is a safe way with a password when i create  the rsa.
   Can I acheive the HA automatic-failover also with a ssh setting
including a passwd?


Regards


2013/12/2 Jitendra Yadav <je...@gmail.com>

> If you are using hadoop user and you have correct ssh conf  then below
> commands
> should works without password.
>
> Execute from NN2 & NN1
> # ssh hadoop@NN1_host
>
> &
>
> Execute from NN2 & NN1
> # ssh hadoop@NN2_host
>
> Regards
> Jitendra
>
>
>
> On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Jitendra
>>   Yes
>>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
>> ssh the NN from each other.Is it an problem?
>>
>> Regards
>>
>>
>>
>>
>> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>>
>>> Are you able to connect both NN hosts using SSH without password?
>>> Make sure you have correct ssh keys in authorized key file.
>>>
>>> Regards
>>> Jitendra
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi Pavan
>>>>
>>>>
>>>>   I'm using sshfence
>>>>
>>>> ------core-site.xml-----------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>fs.defaultFS</name>
>>>>      <value>hdfs://lklcluster</value>
>>>>      <final>true</final>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>hadoop.tmp.dir</name>
>>>>      <value>/home/hadoop/tmp2</value>
>>>>  </property>
>>>>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> -------hdfs-site.xml-------------
>>>>
>>>> <configuration>
>>>>  <property>
>>>>      <name>dfs.namenode.name.dir</name>
>>>>     <value>/home/hadoop/namedir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>      <name>dfs.datanode.data.dir</name>
>>>>      <value>/home/hadoop/datadir2</value>
>>>>  </property>
>>>>
>>>>  <property>
>>>>    <name>dfs.nameservices</name>
>>>>    <value>lklcluster</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>>     <value>nn1,nn2</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>>   <value>hadoop2:8020</value>
>>>> </property>
>>>> <property>
>>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:8020</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>>     <value>hadoop2:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>>     <value>hadoop3:50070</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>>
>>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>>
>>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>>> </property>
>>>> <property>
>>>>   <name>dfs.ha.fencing.methods</name>
>>>>   <value>sshfence</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>>> </property>
>>>>
>>>> <property>
>>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>>      <value>5000</value>
>>>> </property>
>>>>
>>>> <property>
>>>>   <name>dfs.journalnode.edits.dir</name>
>>>>    <value>/home/hadoop/journal/data</value>
>>>> </property>
>>>>
>>>> <property>
>>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>>       <value>true</value>
>>>> </property>
>>>>
>>>> <property>
>>>>      <name>ha.zookeeper.quorum</name>
>>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>>> </property>
>>>>
>>>> </configuration>
>>>>
>>>>
>>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>>
>>>>> Post your config files and in which method you are following for
>>>>> automatic failover
>>>>>
>>>>>
>>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <
>>>>> yypvsxf19870706@gmail.com> wrote:
>>>>>
>>>>>> Hi i
>>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>>
>>>>>>   The cluster can be manully failover ,however failed with the
>>>>>> automatic failover.
>>>>>> I setup the HA according to  the URL
>>>>>>
>>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>>
>>>>>>   When I test the automatic failover, I killed my active NN by kill
>>>>>> -9 <Pid-nn>,while the standby namenode does not change to active state.
>>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>>
>>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>>
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>>
>>>>>> zkfc
>>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>>> Beginning Service Fencing Process... ======
>>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Connecting to hadoop3...
>>>>>> 2013-12-02 19:49:28,590 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>>> 2013-12-02 19:49:28,592 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>>> SSH-2.0-OpenSSH_5.3
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>>> SSH-2.0-JSCH-0.1.42
>>>>>> 2013-12-02 19:49:28,603 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>>> 2013-12-02 19:49:28,608 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>>> 2013-12-02 19:49:28,609 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,610 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>>> hmac-md5 none
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>>> 2013-12-02 19:49:28,617 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>>> 2013-12-02 19:49:28,634 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>>> 2013-12-02 19:49:28,635 WARN
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>>> (RSA) to the list of known hosts.
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>>> 2013-12-02 19:49:28,635 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>>> 2013-12-02 19:49:28,636 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>>> 2013-12-02 19:49:28,637 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>>> 2013-12-02 19:49:28,638 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,639 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> gssapi-with-mic
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>>> continue: publickey,keyboard-interactive,password
>>>>>> 2013-12-02 19:49:28,642 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>>> publickey
>>>>>> 2013-12-02 19:49:28,644 INFO
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>>> port 22
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>>> Unable to connect to hadoop3 as user hadoop
>>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>>> to fence service by any configured method.
>>>>>> 2013-12-02 19:49:28,645 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>>> 2013-12-02 19:49:28,646 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>>> of election
>>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>>> 10.7.23.124:8020
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>>     at
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>>     at
>>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>>> 2013-12-02 19:49:28,646 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x2429313c808025b closed
>>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>>> Initiating client connection,
>>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>>> connection established to hadoop3/10.7.23.124:2181, initiating
>>>>>> session
>>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>>> necessary
>>>>>> 2013-12-02 19:49:29,706 INFO
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>>> 0x3429312ba330262 closed
>>>>>> 2013-12-02 19:49:29,728 WARN
>>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>>> client with sessionId 0x3429312ba330262
>>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>>> EventThread shut down
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
If you are using hadoop user and you have correct ssh conf  then below
commands
should works without password.

Execute from NN2 & NN1
# ssh hadoop@NN1_host

&

Execute from NN2 & NN1
# ssh hadoop@NN2_host

Regards
Jitendra



On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Jitendra
>   Yes
>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
> ssh the NN from each other.Is it an problem?
>
> Regards
>
>
>
>
> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>
>> Are you able to connect both NN hosts using SSH without password?
>> Make sure you have correct ssh keys in authorized key file.
>>
>> Regards
>> Jitendra
>>
>>
>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi Pavan
>>>
>>>
>>>   I'm using sshfence
>>>
>>> ------core-site.xml-----------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>fs.defaultFS</name>
>>>      <value>hdfs://lklcluster</value>
>>>      <final>true</final>
>>>  </property>
>>>
>>>  <property>
>>>      <name>hadoop.tmp.dir</name>
>>>      <value>/home/hadoop/tmp2</value>
>>>  </property>
>>>
>>>
>>> </configuration>
>>>
>>>
>>> -------hdfs-site.xml-------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>dfs.namenode.name.dir</name>
>>>     <value>/home/hadoop/namedir2</value>
>>>  </property>
>>>
>>>  <property>
>>>      <name>dfs.datanode.data.dir</name>
>>>      <value>/home/hadoop/datadir2</value>
>>>  </property>
>>>
>>>  <property>
>>>    <name>dfs.nameservices</name>
>>>    <value>lklcluster</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>     <value>nn1,nn2</value>
>>> </property>
>>> <property>
>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>   <value>hadoop2:8020</value>
>>> </property>
>>> <property>
>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>     <value>hadoop3:8020</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>     <value>hadoop2:50070</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>     <value>hadoop3:50070</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>
>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>> </property>
>>> <property>
>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>
>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>> </property>
>>> <property>
>>>   <name>dfs.ha.fencing.methods</name>
>>>   <value>sshfence</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>      <value>5000</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.journalnode.edits.dir</name>
>>>    <value>/home/hadoop/journal/data</value>
>>> </property>
>>>
>>> <property>
>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>       <value>true</value>
>>> </property>
>>>
>>> <property>
>>>      <name>ha.zookeeper.quorum</name>
>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>> </property>
>>>
>>> </configuration>
>>>
>>>
>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>
>>>> Post your config files and in which method you are following for
>>>> automatic failover
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yypvsxf19870706@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi i
>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>
>>>>>   The cluster can be manully failover ,however failed with the
>>>>> automatic failover.
>>>>> I setup the HA according to  the URL
>>>>>
>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>
>>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>
>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> zkfc
>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>> Beginning Service Fencing Process... ======
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Connecting to hadoop3...
>>>>> 2013-12-02 19:49:28,590 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>> 2013-12-02 19:49:28,592 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>> SSH-2.0-OpenSSH_5.3
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>> SSH-2.0-JSCH-0.1.42
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>> 2013-12-02 19:49:28,609 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>> 2013-12-02 19:49:28,634 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>> 2013-12-02 19:49:28,635 WARN
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>> (RSA) to the list of known hosts.
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>> 2013-12-02 19:49:28,636 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>> 2013-12-02 19:49:28,637 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>> 2013-12-02 19:49:28,638 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,639 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> gssapi-with-mic
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> publickey
>>>>> 2013-12-02 19:49:28,644 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>> port 22
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Unable to connect to hadoop3 as user hadoop
>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>     at
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>> to fence service by any configured method.
>>>>> 2013-12-02 19:49:28,645 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>> 2013-12-02 19:49:28,646 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>> of election
>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>> 10.7.23.124:8020
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,646 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x2429313c808025b closed
>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>> Initiating client connection,
>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>> necessary
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x3429312ba330262 closed
>>>>> 2013-12-02 19:49:29,728 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>> client with sessionId 0x3429312ba330262
>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>>
>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
If you are using hadoop user and you have correct ssh conf  then below
commands
should works without password.

Execute from NN2 & NN1
# ssh hadoop@NN1_host

&

Execute from NN2 & NN1
# ssh hadoop@NN2_host

Regards
Jitendra



On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Jitendra
>   Yes
>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
> ssh the NN from each other.Is it an problem?
>
> Regards
>
>
>
>
> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>
>> Are you able to connect both NN hosts using SSH without password?
>> Make sure you have correct ssh keys in authorized key file.
>>
>> Regards
>> Jitendra
>>
>>
>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi Pavan
>>>
>>>
>>>   I'm using sshfence
>>>
>>> ------core-site.xml-----------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>fs.defaultFS</name>
>>>      <value>hdfs://lklcluster</value>
>>>      <final>true</final>
>>>  </property>
>>>
>>>  <property>
>>>      <name>hadoop.tmp.dir</name>
>>>      <value>/home/hadoop/tmp2</value>
>>>  </property>
>>>
>>>
>>> </configuration>
>>>
>>>
>>> -------hdfs-site.xml-------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>dfs.namenode.name.dir</name>
>>>     <value>/home/hadoop/namedir2</value>
>>>  </property>
>>>
>>>  <property>
>>>      <name>dfs.datanode.data.dir</name>
>>>      <value>/home/hadoop/datadir2</value>
>>>  </property>
>>>
>>>  <property>
>>>    <name>dfs.nameservices</name>
>>>    <value>lklcluster</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>     <value>nn1,nn2</value>
>>> </property>
>>> <property>
>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>   <value>hadoop2:8020</value>
>>> </property>
>>> <property>
>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>     <value>hadoop3:8020</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>     <value>hadoop2:50070</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>     <value>hadoop3:50070</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>
>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>> </property>
>>> <property>
>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>
>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>> </property>
>>> <property>
>>>   <name>dfs.ha.fencing.methods</name>
>>>   <value>sshfence</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>      <value>5000</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.journalnode.edits.dir</name>
>>>    <value>/home/hadoop/journal/data</value>
>>> </property>
>>>
>>> <property>
>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>       <value>true</value>
>>> </property>
>>>
>>> <property>
>>>      <name>ha.zookeeper.quorum</name>
>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>> </property>
>>>
>>> </configuration>
>>>
>>>
>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>
>>>> Post your config files and in which method you are following for
>>>> automatic failover
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yypvsxf19870706@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi i
>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>
>>>>>   The cluster can be manully failover ,however failed with the
>>>>> automatic failover.
>>>>> I setup the HA according to  the URL
>>>>>
>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>
>>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>
>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> zkfc
>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>> Beginning Service Fencing Process... ======
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Connecting to hadoop3...
>>>>> 2013-12-02 19:49:28,590 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>> 2013-12-02 19:49:28,592 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>> SSH-2.0-OpenSSH_5.3
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>> SSH-2.0-JSCH-0.1.42
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>> 2013-12-02 19:49:28,609 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>> 2013-12-02 19:49:28,634 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>> 2013-12-02 19:49:28,635 WARN
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>> (RSA) to the list of known hosts.
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>> 2013-12-02 19:49:28,636 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>> 2013-12-02 19:49:28,637 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>> 2013-12-02 19:49:28,638 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,639 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> gssapi-with-mic
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> publickey
>>>>> 2013-12-02 19:49:28,644 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>> port 22
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Unable to connect to hadoop3 as user hadoop
>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>     at
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>> to fence service by any configured method.
>>>>> 2013-12-02 19:49:28,645 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>> 2013-12-02 19:49:28,646 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>> of election
>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>> 10.7.23.124:8020
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,646 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x2429313c808025b closed
>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>> Initiating client connection,
>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>> necessary
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x3429312ba330262 closed
>>>>> 2013-12-02 19:49:29,728 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>> client with sessionId 0x3429312ba330262
>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>>
>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
If you are using hadoop user and you have correct ssh conf  then below
commands
should works without password.

Execute from NN2 & NN1
# ssh hadoop@NN1_host

&

Execute from NN2 & NN1
# ssh hadoop@NN2_host

Regards
Jitendra



On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Jitendra
>   Yes
>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
> ssh the NN from each other.Is it an problem?
>
> Regards
>
>
>
>
> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>
>> Are you able to connect both NN hosts using SSH without password?
>> Make sure you have correct ssh keys in authorized key file.
>>
>> Regards
>> Jitendra
>>
>>
>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi Pavan
>>>
>>>
>>>   I'm using sshfence
>>>
>>> ------core-site.xml-----------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>fs.defaultFS</name>
>>>      <value>hdfs://lklcluster</value>
>>>      <final>true</final>
>>>  </property>
>>>
>>>  <property>
>>>      <name>hadoop.tmp.dir</name>
>>>      <value>/home/hadoop/tmp2</value>
>>>  </property>
>>>
>>>
>>> </configuration>
>>>
>>>
>>> -------hdfs-site.xml-------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>dfs.namenode.name.dir</name>
>>>     <value>/home/hadoop/namedir2</value>
>>>  </property>
>>>
>>>  <property>
>>>      <name>dfs.datanode.data.dir</name>
>>>      <value>/home/hadoop/datadir2</value>
>>>  </property>
>>>
>>>  <property>
>>>    <name>dfs.nameservices</name>
>>>    <value>lklcluster</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>     <value>nn1,nn2</value>
>>> </property>
>>> <property>
>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>   <value>hadoop2:8020</value>
>>> </property>
>>> <property>
>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>     <value>hadoop3:8020</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>     <value>hadoop2:50070</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>     <value>hadoop3:50070</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>
>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>> </property>
>>> <property>
>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>
>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>> </property>
>>> <property>
>>>   <name>dfs.ha.fencing.methods</name>
>>>   <value>sshfence</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>      <value>5000</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.journalnode.edits.dir</name>
>>>    <value>/home/hadoop/journal/data</value>
>>> </property>
>>>
>>> <property>
>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>       <value>true</value>
>>> </property>
>>>
>>> <property>
>>>      <name>ha.zookeeper.quorum</name>
>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>> </property>
>>>
>>> </configuration>
>>>
>>>
>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>
>>>> Post your config files and in which method you are following for
>>>> automatic failover
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yypvsxf19870706@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi i
>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>
>>>>>   The cluster can be manully failover ,however failed with the
>>>>> automatic failover.
>>>>> I setup the HA according to  the URL
>>>>>
>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>
>>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>
>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> zkfc
>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>> Beginning Service Fencing Process... ======
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Connecting to hadoop3...
>>>>> 2013-12-02 19:49:28,590 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>> 2013-12-02 19:49:28,592 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>> SSH-2.0-OpenSSH_5.3
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>> SSH-2.0-JSCH-0.1.42
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>> 2013-12-02 19:49:28,609 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>> 2013-12-02 19:49:28,634 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>> 2013-12-02 19:49:28,635 WARN
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>> (RSA) to the list of known hosts.
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>> 2013-12-02 19:49:28,636 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>> 2013-12-02 19:49:28,637 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>> 2013-12-02 19:49:28,638 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,639 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> gssapi-with-mic
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> publickey
>>>>> 2013-12-02 19:49:28,644 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>> port 22
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Unable to connect to hadoop3 as user hadoop
>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>     at
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>> to fence service by any configured method.
>>>>> 2013-12-02 19:49:28,645 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>> 2013-12-02 19:49:28,646 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>> of election
>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>> 10.7.23.124:8020
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,646 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x2429313c808025b closed
>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>> Initiating client connection,
>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>> necessary
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x3429312ba330262 closed
>>>>> 2013-12-02 19:49:29,728 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>> client with sessionId 0x3429312ba330262
>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>>
>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
If you are using hadoop user and you have correct ssh conf  then below
commands
should works without password.

Execute from NN2 & NN1
# ssh hadoop@NN1_host

&

Execute from NN2 & NN1
# ssh hadoop@NN2_host

Regards
Jitendra



On Mon, Dec 2, 2013 at 6:10 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Jitendra
>   Yes
>   I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
> ssh the NN from each other.Is it an problem?
>
> Regards
>
>
>
>
> 2013/12/2 Jitendra Yadav <je...@gmail.com>
>
>> Are you able to connect both NN hosts using SSH without password?
>> Make sure you have correct ssh keys in authorized key file.
>>
>> Regards
>> Jitendra
>>
>>
>> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi Pavan
>>>
>>>
>>>   I'm using sshfence
>>>
>>> ------core-site.xml-----------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>fs.defaultFS</name>
>>>      <value>hdfs://lklcluster</value>
>>>      <final>true</final>
>>>  </property>
>>>
>>>  <property>
>>>      <name>hadoop.tmp.dir</name>
>>>      <value>/home/hadoop/tmp2</value>
>>>  </property>
>>>
>>>
>>> </configuration>
>>>
>>>
>>> -------hdfs-site.xml-------------
>>>
>>> <configuration>
>>>  <property>
>>>      <name>dfs.namenode.name.dir</name>
>>>     <value>/home/hadoop/namedir2</value>
>>>  </property>
>>>
>>>  <property>
>>>      <name>dfs.datanode.data.dir</name>
>>>      <value>/home/hadoop/datadir2</value>
>>>  </property>
>>>
>>>  <property>
>>>    <name>dfs.nameservices</name>
>>>    <value>lklcluster</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.namenodes.lklcluster</name>
>>>     <value>nn1,nn2</value>
>>> </property>
>>> <property>
>>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>>   <value>hadoop2:8020</value>
>>> </property>
>>> <property>
>>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>>     <value>hadoop3:8020</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>>     <value>hadoop2:50070</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>>     <value>hadoop3:50070</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.namenode.shared.edits.dir</name>
>>>
>>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>>> </property>
>>> <property>
>>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>>
>>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>>> </property>
>>> <property>
>>>   <name>dfs.ha.fencing.methods</name>
>>>   <value>sshfence</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>>    <value>/home/hadoop/.ssh/id_rsa</value>
>>> </property>
>>>
>>> <property>
>>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>>      <value>5000</value>
>>> </property>
>>>
>>> <property>
>>>   <name>dfs.journalnode.edits.dir</name>
>>>    <value>/home/hadoop/journal/data</value>
>>> </property>
>>>
>>> <property>
>>>    <name>dfs.ha.automatic-failover.enabled</name>
>>>       <value>true</value>
>>> </property>
>>>
>>> <property>
>>>      <name>ha.zookeeper.quorum</name>
>>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>>> </property>
>>>
>>> </configuration>
>>>
>>>
>>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>>
>>>> Post your config files and in which method you are following for
>>>> automatic failover
>>>>
>>>>
>>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yypvsxf19870706@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi i
>>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>>
>>>>>   The cluster can be manully failover ,however failed with the
>>>>> automatic failover.
>>>>> I setup the HA according to  the URL
>>>>>
>>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>>
>>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>>
>>>>>  Please help me ,any suggestion will be appreciated.
>>>>>
>>>>>
>>>>> Regards.
>>>>>
>>>>>
>>>>> zkfc
>>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>>
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>>> Beginning Service Fencing Process... ======
>>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Connecting to hadoop3...
>>>>> 2013-12-02 19:49:28,590 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>>> 2013-12-02 19:49:28,592 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>>> SSH-2.0-OpenSSH_5.3
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>>> SSH-2.0-JSCH-0.1.42
>>>>> 2013-12-02 19:49:28,603 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>>> 2013-12-02 19:49:28,608 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>>> 2013-12-02 19:49:28,609 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,610 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>>> hmac-md5 none
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>>> 2013-12-02 19:49:28,617 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>>> 2013-12-02 19:49:28,634 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>>> 2013-12-02 19:49:28,635 WARN
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>>> (RSA) to the list of known hosts.
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>>> 2013-12-02 19:49:28,635 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>>> 2013-12-02 19:49:28,636 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>>> 2013-12-02 19:49:28,637 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>>> 2013-12-02 19:49:28,638 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,639 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> gssapi-with-mic
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>>> continue: publickey,keyboard-interactive,password
>>>>> 2013-12-02 19:49:28,642 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>>> publickey
>>>>> 2013-12-02 19:49:28,644 INFO
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>>> port 22
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>>> Unable to connect to hadoop3 as user hadoop
>>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>>     at
>>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>>> to fence service by any configured method.
>>>>> 2013-12-02 19:49:28,645 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Local service NameNode at
>>>>> hadoop2/10.7.23.125:8020 entered state: SERVICE_NOT_RESPONDING
>>>>> 2013-12-02 19:49:28,646 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning
>>>>> of election
>>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>>> 10.7.23.124:8020
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>>     at
>>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>>     at
>>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>>     at
>>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>>> 2013-12-02 19:49:28,646 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
>>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x2429313c808025b closed
>>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper:
>>>>> Initiating client connection,
>>>>> connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181 sessionTimeout=5000
>>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not
>>>>> attempt to authenticate using SASL (Unable to locate a login configuration)
>>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid
>>>>> = 0x3429312ba330262, negotiated timeout = 5000
>>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Session connected.
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ZKFailoverController: Quitting master election for
>>>>> NameNode at hadoop2/10.7.23.125:8020 and marking that fencing is
>>>>> necessary
>>>>> 2013-12-02 19:49:29,706 INFO
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
>>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>>> 0x3429312ba330262 closed
>>>>> 2013-12-02 19:49:29,728 WARN
>>>>> org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old
>>>>> client with sessionId 0x3429312ba330262
>>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>>> EventThread shut down
>>>>>
>>>>
>>>>
>>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Jitendra
  Yes
  I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
ssh the NN from each other.Is it an problem?

Regards




2013/12/2 Jitendra Yadav <je...@gmail.com>

> Are you able to connect both NN hosts using SSH without password?
> Make sure you have correct ssh keys in authorized key file.
>
> Regards
> Jitendra
>
>
> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Pavan
>>
>>
>>   I'm using sshfence
>>
>> ------core-site.xml-----------------
>>
>> <configuration>
>>  <property>
>>      <name>fs.defaultFS</name>
>>      <value>hdfs://lklcluster</value>
>>      <final>true</final>
>>  </property>
>>
>>  <property>
>>      <name>hadoop.tmp.dir</name>
>>      <value>/home/hadoop/tmp2</value>
>>  </property>
>>
>>
>> </configuration>
>>
>>
>> -------hdfs-site.xml-------------
>>
>> <configuration>
>>  <property>
>>      <name>dfs.namenode.name.dir</name>
>>     <value>/home/hadoop/namedir2</value>
>>  </property>
>>
>>  <property>
>>      <name>dfs.datanode.data.dir</name>
>>      <value>/home/hadoop/datadir2</value>
>>  </property>
>>
>>  <property>
>>    <name>dfs.nameservices</name>
>>    <value>lklcluster</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.namenodes.lklcluster</name>
>>     <value>nn1,nn2</value>
>> </property>
>> <property>
>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>   <value>hadoop2:8020</value>
>> </property>
>> <property>
>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>     <value>hadoop3:8020</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>     <value>hadoop2:50070</value>
>> </property>
>>
>> <property>
>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>     <value>hadoop3:50070</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.shared.edits.dir</name>
>>
>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>> </property>
>> <property>
>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>
>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>> </property>
>> <property>
>>   <name>dfs.ha.fencing.methods</name>
>>   <value>sshfence</value>
>> </property>
>>
>> <property>
>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>    <value>/home/hadoop/.ssh/id_rsa</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>      <value>5000</value>
>> </property>
>>
>> <property>
>>   <name>dfs.journalnode.edits.dir</name>
>>    <value>/home/hadoop/journal/data</value>
>> </property>
>>
>> <property>
>>    <name>dfs.ha.automatic-failover.enabled</name>
>>       <value>true</value>
>> </property>
>>
>> <property>
>>      <name>ha.zookeeper.quorum</name>
>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>> </property>
>>
>> </configuration>
>>
>>
>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>
>>> Post your config files and in which method you are following for
>>> automatic failover
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi i
>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>
>>>>   The cluster can be manully failover ,however failed with the
>>>> automatic failover.
>>>> I setup the HA according to  the URL
>>>>
>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>
>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>
>>>>  Please help me ,any suggestion will be appreciated.
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>> zkfc
>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>> Beginning Service Fencing Process... ======
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Connecting to hadoop3...
>>>> 2013-12-02 19:49:28,590 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>> 2013-12-02 19:49:28,592 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>> SSH-2.0-OpenSSH_5.3
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>> SSH-2.0-JSCH-0.1.42
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>> 2013-12-02 19:49:28,609 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>> 2013-12-02 19:49:28,634 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>> 2013-12-02 19:49:28,635 WARN
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>> (RSA) to the list of known hosts.
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>> 2013-12-02 19:49:28,636 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>> 2013-12-02 19:49:28,637 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>> 2013-12-02 19:49:28,638 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,639 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> gssapi-with-mic
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> publickey
>>>> 2013-12-02 19:49:28,644 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>> port 22
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Unable to connect to hadoop3 as user hadoop
>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>     at
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>> to fence service by any configured method.
>>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>>> SERVICE_NOT_RESPONDING
>>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Exception handling the winning of election
>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>> 10.7.23.124:8020
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Trying to re-establish ZK session
>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x2429313c808025b closed
>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>>> sessionTimeout=5000
>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>>> to authenticate using SASL (Unable to locate a login configuration)
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>>> 0x3429312ba330262, negotiated timeout = 5000
>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Session connected.
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>>> marking that fencing is necessary
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Yielding from election
>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x3429312ba330262 closed
>>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>>
>>>
>>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Jitendra
  Yes
  I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
ssh the NN from each other.Is it an problem?

Regards




2013/12/2 Jitendra Yadav <je...@gmail.com>

> Are you able to connect both NN hosts using SSH without password?
> Make sure you have correct ssh keys in authorized key file.
>
> Regards
> Jitendra
>
>
> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Pavan
>>
>>
>>   I'm using sshfence
>>
>> ------core-site.xml-----------------
>>
>> <configuration>
>>  <property>
>>      <name>fs.defaultFS</name>
>>      <value>hdfs://lklcluster</value>
>>      <final>true</final>
>>  </property>
>>
>>  <property>
>>      <name>hadoop.tmp.dir</name>
>>      <value>/home/hadoop/tmp2</value>
>>  </property>
>>
>>
>> </configuration>
>>
>>
>> -------hdfs-site.xml-------------
>>
>> <configuration>
>>  <property>
>>      <name>dfs.namenode.name.dir</name>
>>     <value>/home/hadoop/namedir2</value>
>>  </property>
>>
>>  <property>
>>      <name>dfs.datanode.data.dir</name>
>>      <value>/home/hadoop/datadir2</value>
>>  </property>
>>
>>  <property>
>>    <name>dfs.nameservices</name>
>>    <value>lklcluster</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.namenodes.lklcluster</name>
>>     <value>nn1,nn2</value>
>> </property>
>> <property>
>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>   <value>hadoop2:8020</value>
>> </property>
>> <property>
>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>     <value>hadoop3:8020</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>     <value>hadoop2:50070</value>
>> </property>
>>
>> <property>
>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>     <value>hadoop3:50070</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.shared.edits.dir</name>
>>
>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>> </property>
>> <property>
>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>
>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>> </property>
>> <property>
>>   <name>dfs.ha.fencing.methods</name>
>>   <value>sshfence</value>
>> </property>
>>
>> <property>
>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>    <value>/home/hadoop/.ssh/id_rsa</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>      <value>5000</value>
>> </property>
>>
>> <property>
>>   <name>dfs.journalnode.edits.dir</name>
>>    <value>/home/hadoop/journal/data</value>
>> </property>
>>
>> <property>
>>    <name>dfs.ha.automatic-failover.enabled</name>
>>       <value>true</value>
>> </property>
>>
>> <property>
>>      <name>ha.zookeeper.quorum</name>
>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>> </property>
>>
>> </configuration>
>>
>>
>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>
>>> Post your config files and in which method you are following for
>>> automatic failover
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi i
>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>
>>>>   The cluster can be manully failover ,however failed with the
>>>> automatic failover.
>>>> I setup the HA according to  the URL
>>>>
>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>
>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>
>>>>  Please help me ,any suggestion will be appreciated.
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>> zkfc
>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>> Beginning Service Fencing Process... ======
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Connecting to hadoop3...
>>>> 2013-12-02 19:49:28,590 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>> 2013-12-02 19:49:28,592 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>> SSH-2.0-OpenSSH_5.3
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>> SSH-2.0-JSCH-0.1.42
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>> 2013-12-02 19:49:28,609 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>> 2013-12-02 19:49:28,634 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>> 2013-12-02 19:49:28,635 WARN
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>> (RSA) to the list of known hosts.
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>> 2013-12-02 19:49:28,636 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>> 2013-12-02 19:49:28,637 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>> 2013-12-02 19:49:28,638 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,639 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> gssapi-with-mic
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> publickey
>>>> 2013-12-02 19:49:28,644 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>> port 22
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Unable to connect to hadoop3 as user hadoop
>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>     at
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>> to fence service by any configured method.
>>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>>> SERVICE_NOT_RESPONDING
>>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Exception handling the winning of election
>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>> 10.7.23.124:8020
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Trying to re-establish ZK session
>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x2429313c808025b closed
>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>>> sessionTimeout=5000
>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>>> to authenticate using SASL (Unable to locate a login configuration)
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>>> 0x3429312ba330262, negotiated timeout = 5000
>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Session connected.
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>>> marking that fencing is necessary
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Yielding from election
>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x3429312ba330262 closed
>>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>>
>>>
>>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Jitendra
  Yes
  I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
ssh the NN from each other.Is it an problem?

Regards




2013/12/2 Jitendra Yadav <je...@gmail.com>

> Are you able to connect both NN hosts using SSH without password?
> Make sure you have correct ssh keys in authorized key file.
>
> Regards
> Jitendra
>
>
> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Pavan
>>
>>
>>   I'm using sshfence
>>
>> ------core-site.xml-----------------
>>
>> <configuration>
>>  <property>
>>      <name>fs.defaultFS</name>
>>      <value>hdfs://lklcluster</value>
>>      <final>true</final>
>>  </property>
>>
>>  <property>
>>      <name>hadoop.tmp.dir</name>
>>      <value>/home/hadoop/tmp2</value>
>>  </property>
>>
>>
>> </configuration>
>>
>>
>> -------hdfs-site.xml-------------
>>
>> <configuration>
>>  <property>
>>      <name>dfs.namenode.name.dir</name>
>>     <value>/home/hadoop/namedir2</value>
>>  </property>
>>
>>  <property>
>>      <name>dfs.datanode.data.dir</name>
>>      <value>/home/hadoop/datadir2</value>
>>  </property>
>>
>>  <property>
>>    <name>dfs.nameservices</name>
>>    <value>lklcluster</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.namenodes.lklcluster</name>
>>     <value>nn1,nn2</value>
>> </property>
>> <property>
>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>   <value>hadoop2:8020</value>
>> </property>
>> <property>
>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>     <value>hadoop3:8020</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>     <value>hadoop2:50070</value>
>> </property>
>>
>> <property>
>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>     <value>hadoop3:50070</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.shared.edits.dir</name>
>>
>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>> </property>
>> <property>
>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>
>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>> </property>
>> <property>
>>   <name>dfs.ha.fencing.methods</name>
>>   <value>sshfence</value>
>> </property>
>>
>> <property>
>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>    <value>/home/hadoop/.ssh/id_rsa</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>      <value>5000</value>
>> </property>
>>
>> <property>
>>   <name>dfs.journalnode.edits.dir</name>
>>    <value>/home/hadoop/journal/data</value>
>> </property>
>>
>> <property>
>>    <name>dfs.ha.automatic-failover.enabled</name>
>>       <value>true</value>
>> </property>
>>
>> <property>
>>      <name>ha.zookeeper.quorum</name>
>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>> </property>
>>
>> </configuration>
>>
>>
>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>
>>> Post your config files and in which method you are following for
>>> automatic failover
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi i
>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>
>>>>   The cluster can be manully failover ,however failed with the
>>>> automatic failover.
>>>> I setup the HA according to  the URL
>>>>
>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>
>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>
>>>>  Please help me ,any suggestion will be appreciated.
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>> zkfc
>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>> Beginning Service Fencing Process... ======
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Connecting to hadoop3...
>>>> 2013-12-02 19:49:28,590 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>> 2013-12-02 19:49:28,592 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>> SSH-2.0-OpenSSH_5.3
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>> SSH-2.0-JSCH-0.1.42
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>> 2013-12-02 19:49:28,609 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>> 2013-12-02 19:49:28,634 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>> 2013-12-02 19:49:28,635 WARN
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>> (RSA) to the list of known hosts.
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>> 2013-12-02 19:49:28,636 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>> 2013-12-02 19:49:28,637 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>> 2013-12-02 19:49:28,638 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,639 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> gssapi-with-mic
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> publickey
>>>> 2013-12-02 19:49:28,644 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>> port 22
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Unable to connect to hadoop3 as user hadoop
>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>     at
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>> to fence service by any configured method.
>>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>>> SERVICE_NOT_RESPONDING
>>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Exception handling the winning of election
>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>> 10.7.23.124:8020
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Trying to re-establish ZK session
>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x2429313c808025b closed
>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>>> sessionTimeout=5000
>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>>> to authenticate using SASL (Unable to locate a login configuration)
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>>> 0x3429312ba330262, negotiated timeout = 5000
>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Session connected.
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>>> marking that fencing is necessary
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Yielding from election
>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x3429312ba330262 closed
>>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>>
>>>
>>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Jitendra
  Yes
  I'm doubt that it need to enter the ssh-agent bash & ssh-add  before I
ssh the NN from each other.Is it an problem?

Regards




2013/12/2 Jitendra Yadav <je...@gmail.com>

> Are you able to connect both NN hosts using SSH without password?
> Make sure you have correct ssh keys in authorized key file.
>
> Regards
> Jitendra
>
>
> On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi Pavan
>>
>>
>>   I'm using sshfence
>>
>> ------core-site.xml-----------------
>>
>> <configuration>
>>  <property>
>>      <name>fs.defaultFS</name>
>>      <value>hdfs://lklcluster</value>
>>      <final>true</final>
>>  </property>
>>
>>  <property>
>>      <name>hadoop.tmp.dir</name>
>>      <value>/home/hadoop/tmp2</value>
>>  </property>
>>
>>
>> </configuration>
>>
>>
>> -------hdfs-site.xml-------------
>>
>> <configuration>
>>  <property>
>>      <name>dfs.namenode.name.dir</name>
>>     <value>/home/hadoop/namedir2</value>
>>  </property>
>>
>>  <property>
>>      <name>dfs.datanode.data.dir</name>
>>      <value>/home/hadoop/datadir2</value>
>>  </property>
>>
>>  <property>
>>    <name>dfs.nameservices</name>
>>    <value>lklcluster</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.namenodes.lklcluster</name>
>>     <value>nn1,nn2</value>
>> </property>
>> <property>
>>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>>   <value>hadoop2:8020</value>
>> </property>
>> <property>
>>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>>     <value>hadoop3:8020</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>>     <value>hadoop2:50070</value>
>> </property>
>>
>> <property>
>>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>>     <value>hadoop3:50070</value>
>> </property>
>>
>> <property>
>>   <name>dfs.namenode.shared.edits.dir</name>
>>
>> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
>> </property>
>> <property>
>>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>>
>> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
>> </property>
>> <property>
>>   <name>dfs.ha.fencing.methods</name>
>>   <value>sshfence</value>
>> </property>
>>
>> <property>
>>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>>    <value>/home/hadoop/.ssh/id_rsa</value>
>> </property>
>>
>> <property>
>>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>>      <value>5000</value>
>> </property>
>>
>> <property>
>>   <name>dfs.journalnode.edits.dir</name>
>>    <value>/home/hadoop/journal/data</value>
>> </property>
>>
>> <property>
>>    <name>dfs.ha.automatic-failover.enabled</name>
>>       <value>true</value>
>> </property>
>>
>> <property>
>>      <name>ha.zookeeper.quorum</name>
>>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
>> </property>
>>
>> </configuration>
>>
>>
>> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>>
>>> Post your config files and in which method you are following for
>>> automatic failover
>>>
>>>
>>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>>
>>>> Hi i
>>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>>
>>>>   The cluster can be manully failover ,however failed with the
>>>> automatic failover.
>>>> I setup the HA according to  the URL
>>>>
>>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>>
>>>>   When I test the automatic failover, I killed my active NN by kill -9
>>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>>   It came out the log in my DFSZKFailoverController as [1]
>>>>
>>>>  Please help me ,any suggestion will be appreciated.
>>>>
>>>>
>>>> Regards.
>>>>
>>>>
>>>> zkfc
>>>> log[1]----------------------------------------------------------------------------------------------------
>>>>
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>>> Beginning Service Fencing Process... ======
>>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Connecting to hadoop3...
>>>> 2013-12-02 19:49:28,590 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>>> 2013-12-02 19:49:28,592 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>>> SSH-2.0-OpenSSH_5.3
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>>> SSH-2.0-JSCH-0.1.42
>>>> 2013-12-02 19:49:28,603 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>>> 2013-12-02 19:49:28,608 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>>> 2013-12-02 19:49:28,609 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,610 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>>> hmac-md5 none
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>>> 2013-12-02 19:49:28,617 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>>> 2013-12-02 19:49:28,634 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>>> 2013-12-02 19:49:28,635 WARN
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>>> (RSA) to the list of known hosts.
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>>> 2013-12-02 19:49:28,635 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>>> 2013-12-02 19:49:28,636 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>>> 2013-12-02 19:49:28,637 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>>> 2013-12-02 19:49:28,638 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,639 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> gssapi-with-mic
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>>> continue: publickey,keyboard-interactive,password
>>>> 2013-12-02 19:49:28,642 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>>> publickey
>>>> 2013-12-02 19:49:28,644 INFO
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>>> port 22
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>>> Unable to connect to hadoop3 as user hadoop
>>>> com.jcraft.jsch.JSchException: Auth fail
>>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>>     at
>>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable
>>>> to fence service by any configured method.
>>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>>> SERVICE_NOT_RESPONDING
>>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Exception handling the winning of election
>>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>>> 10.7.23.124:8020
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>>     at
>>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>>     at
>>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>>     at
>>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Trying to re-establish ZK session
>>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x2429313c808025b closed
>>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>>> sessionTimeout=5000
>>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>>> to authenticate using SASL (Unable to locate a login configuration)
>>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>>> 0x3429312ba330262, negotiated timeout = 5000
>>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Session connected.
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>>> marking that fencing is necessary
>>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Yielding from election
>>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>>> 0x3429312ba330262 closed
>>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>>> EventThread shut down
>>>>
>>>
>>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Are you able to connect both NN hosts using SSH without password?
Make sure you have correct ssh keys in authorized key file.

Regards
Jitendra


On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Pavan
>
>
>   I'm using sshfence
>
> ------core-site.xml-----------------
>
> <configuration>
>  <property>
>      <name>fs.defaultFS</name>
>      <value>hdfs://lklcluster</value>
>      <final>true</final>
>  </property>
>
>  <property>
>      <name>hadoop.tmp.dir</name>
>      <value>/home/hadoop/tmp2</value>
>  </property>
>
>
> </configuration>
>
>
> -------hdfs-site.xml-------------
>
> <configuration>
>  <property>
>      <name>dfs.namenode.name.dir</name>
>     <value>/home/hadoop/namedir2</value>
>  </property>
>
>  <property>
>      <name>dfs.datanode.data.dir</name>
>      <value>/home/hadoop/datadir2</value>
>  </property>
>
>  <property>
>    <name>dfs.nameservices</name>
>    <value>lklcluster</value>
> </property>
>
> <property>
>     <name>dfs.ha.namenodes.lklcluster</name>
>     <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>   <value>hadoop2:8020</value>
> </property>
> <property>
>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>     <value>hadoop3:8020</value>
> </property>
>
> <property>
>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>     <value>hadoop2:50070</value>
> </property>
>
> <property>
>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>     <value>hadoop3:50070</value>
> </property>
>
> <property>
>   <name>dfs.namenode.shared.edits.dir</name>
>
> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
> </property>
> <property>
>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.ha.fencing.methods</name>
>   <value>sshfence</value>
> </property>
>
> <property>
>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>    <value>/home/hadoop/.ssh/id_rsa</value>
> </property>
>
> <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>      <value>5000</value>
> </property>
>
> <property>
>   <name>dfs.journalnode.edits.dir</name>
>    <value>/home/hadoop/journal/data</value>
> </property>
>
> <property>
>    <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
> </property>
>
> <property>
>      <name>ha.zookeeper.quorum</name>
>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
> </property>
>
> </configuration>
>
>
> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>
>> Post your config files and in which method you are following for
>> automatic failover
>>
>>
>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi i
>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>
>>>   The cluster can be manully failover ,however failed with the automatic
>>> failover.
>>> I setup the HA according to  the URL
>>>
>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>
>>>   When I test the automatic failover, I killed my active NN by kill -9
>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>   It came out the log in my DFSZKFailoverController as [1]
>>>
>>>  Please help me ,any suggestion will be appreciated.
>>>
>>>
>>> Regards.
>>>
>>>
>>> zkfc
>>> log[1]----------------------------------------------------------------------------------------------------
>>>
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>> Beginning Service Fencing Process... ======
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Connecting to hadoop3...
>>> 2013-12-02 19:49:28,590 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>> 2013-12-02 19:49:28,592 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>> SSH-2.0-OpenSSH_5.3
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>> SSH-2.0-JSCH-0.1.42
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>> 2013-12-02 19:49:28,609 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>> 2013-12-02 19:49:28,634 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>> 2013-12-02 19:49:28,635 WARN
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>> (RSA) to the list of known hosts.
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>> 2013-12-02 19:49:28,636 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>> 2013-12-02 19:49:28,637 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>> 2013-12-02 19:49:28,638 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,639 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> gssapi-with-mic
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> publickey
>>> 2013-12-02 19:49:28,644 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>> port 22
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Unable to connect to hadoop3 as user hadoop
>>> com.jcraft.jsch.JSchException: Auth fail
>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>     at
>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>>> fence service by any configured method.
>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>> SERVICE_NOT_RESPONDING
>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Exception handling the winning of election
>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>> 10.7.23.124:8020
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Trying to re-establish ZK session
>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x2429313c808025b closed
>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>> sessionTimeout=5000
>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>> to authenticate using SASL (Unable to locate a login configuration)
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>> 0x3429312ba330262, negotiated timeout = 5000
>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Session connected.
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>> marking that fencing is necessary
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Yielding from election
>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x3429312ba330262 closed
>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>>
>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Are you able to connect both NN hosts using SSH without password?
Make sure you have correct ssh keys in authorized key file.

Regards
Jitendra


On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Pavan
>
>
>   I'm using sshfence
>
> ------core-site.xml-----------------
>
> <configuration>
>  <property>
>      <name>fs.defaultFS</name>
>      <value>hdfs://lklcluster</value>
>      <final>true</final>
>  </property>
>
>  <property>
>      <name>hadoop.tmp.dir</name>
>      <value>/home/hadoop/tmp2</value>
>  </property>
>
>
> </configuration>
>
>
> -------hdfs-site.xml-------------
>
> <configuration>
>  <property>
>      <name>dfs.namenode.name.dir</name>
>     <value>/home/hadoop/namedir2</value>
>  </property>
>
>  <property>
>      <name>dfs.datanode.data.dir</name>
>      <value>/home/hadoop/datadir2</value>
>  </property>
>
>  <property>
>    <name>dfs.nameservices</name>
>    <value>lklcluster</value>
> </property>
>
> <property>
>     <name>dfs.ha.namenodes.lklcluster</name>
>     <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>   <value>hadoop2:8020</value>
> </property>
> <property>
>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>     <value>hadoop3:8020</value>
> </property>
>
> <property>
>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>     <value>hadoop2:50070</value>
> </property>
>
> <property>
>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>     <value>hadoop3:50070</value>
> </property>
>
> <property>
>   <name>dfs.namenode.shared.edits.dir</name>
>
> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
> </property>
> <property>
>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.ha.fencing.methods</name>
>   <value>sshfence</value>
> </property>
>
> <property>
>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>    <value>/home/hadoop/.ssh/id_rsa</value>
> </property>
>
> <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>      <value>5000</value>
> </property>
>
> <property>
>   <name>dfs.journalnode.edits.dir</name>
>    <value>/home/hadoop/journal/data</value>
> </property>
>
> <property>
>    <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
> </property>
>
> <property>
>      <name>ha.zookeeper.quorum</name>
>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
> </property>
>
> </configuration>
>
>
> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>
>> Post your config files and in which method you are following for
>> automatic failover
>>
>>
>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi i
>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>
>>>   The cluster can be manully failover ,however failed with the automatic
>>> failover.
>>> I setup the HA according to  the URL
>>>
>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>
>>>   When I test the automatic failover, I killed my active NN by kill -9
>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>   It came out the log in my DFSZKFailoverController as [1]
>>>
>>>  Please help me ,any suggestion will be appreciated.
>>>
>>>
>>> Regards.
>>>
>>>
>>> zkfc
>>> log[1]----------------------------------------------------------------------------------------------------
>>>
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>> Beginning Service Fencing Process... ======
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Connecting to hadoop3...
>>> 2013-12-02 19:49:28,590 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>> 2013-12-02 19:49:28,592 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>> SSH-2.0-OpenSSH_5.3
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>> SSH-2.0-JSCH-0.1.42
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>> 2013-12-02 19:49:28,609 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>> 2013-12-02 19:49:28,634 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>> 2013-12-02 19:49:28,635 WARN
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>> (RSA) to the list of known hosts.
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>> 2013-12-02 19:49:28,636 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>> 2013-12-02 19:49:28,637 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>> 2013-12-02 19:49:28,638 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,639 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> gssapi-with-mic
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> publickey
>>> 2013-12-02 19:49:28,644 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>> port 22
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Unable to connect to hadoop3 as user hadoop
>>> com.jcraft.jsch.JSchException: Auth fail
>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>     at
>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>>> fence service by any configured method.
>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>> SERVICE_NOT_RESPONDING
>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Exception handling the winning of election
>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>> 10.7.23.124:8020
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Trying to re-establish ZK session
>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x2429313c808025b closed
>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>> sessionTimeout=5000
>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>> to authenticate using SASL (Unable to locate a login configuration)
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>> 0x3429312ba330262, negotiated timeout = 5000
>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Session connected.
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>> marking that fencing is necessary
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Yielding from election
>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x3429312ba330262 closed
>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>>
>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Are you able to connect both NN hosts using SSH without password?
Make sure you have correct ssh keys in authorized key file.

Regards
Jitendra


On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Pavan
>
>
>   I'm using sshfence
>
> ------core-site.xml-----------------
>
> <configuration>
>  <property>
>      <name>fs.defaultFS</name>
>      <value>hdfs://lklcluster</value>
>      <final>true</final>
>  </property>
>
>  <property>
>      <name>hadoop.tmp.dir</name>
>      <value>/home/hadoop/tmp2</value>
>  </property>
>
>
> </configuration>
>
>
> -------hdfs-site.xml-------------
>
> <configuration>
>  <property>
>      <name>dfs.namenode.name.dir</name>
>     <value>/home/hadoop/namedir2</value>
>  </property>
>
>  <property>
>      <name>dfs.datanode.data.dir</name>
>      <value>/home/hadoop/datadir2</value>
>  </property>
>
>  <property>
>    <name>dfs.nameservices</name>
>    <value>lklcluster</value>
> </property>
>
> <property>
>     <name>dfs.ha.namenodes.lklcluster</name>
>     <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>   <value>hadoop2:8020</value>
> </property>
> <property>
>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>     <value>hadoop3:8020</value>
> </property>
>
> <property>
>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>     <value>hadoop2:50070</value>
> </property>
>
> <property>
>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>     <value>hadoop3:50070</value>
> </property>
>
> <property>
>   <name>dfs.namenode.shared.edits.dir</name>
>
> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
> </property>
> <property>
>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.ha.fencing.methods</name>
>   <value>sshfence</value>
> </property>
>
> <property>
>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>    <value>/home/hadoop/.ssh/id_rsa</value>
> </property>
>
> <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>      <value>5000</value>
> </property>
>
> <property>
>   <name>dfs.journalnode.edits.dir</name>
>    <value>/home/hadoop/journal/data</value>
> </property>
>
> <property>
>    <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
> </property>
>
> <property>
>      <name>ha.zookeeper.quorum</name>
>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
> </property>
>
> </configuration>
>
>
> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>
>> Post your config files and in which method you are following for
>> automatic failover
>>
>>
>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi i
>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>
>>>   The cluster can be manully failover ,however failed with the automatic
>>> failover.
>>> I setup the HA according to  the URL
>>>
>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>
>>>   When I test the automatic failover, I killed my active NN by kill -9
>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>   It came out the log in my DFSZKFailoverController as [1]
>>>
>>>  Please help me ,any suggestion will be appreciated.
>>>
>>>
>>> Regards.
>>>
>>>
>>> zkfc
>>> log[1]----------------------------------------------------------------------------------------------------
>>>
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>> Beginning Service Fencing Process... ======
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Connecting to hadoop3...
>>> 2013-12-02 19:49:28,590 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>> 2013-12-02 19:49:28,592 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>> SSH-2.0-OpenSSH_5.3
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>> SSH-2.0-JSCH-0.1.42
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>> 2013-12-02 19:49:28,609 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>> 2013-12-02 19:49:28,634 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>> 2013-12-02 19:49:28,635 WARN
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>> (RSA) to the list of known hosts.
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>> 2013-12-02 19:49:28,636 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>> 2013-12-02 19:49:28,637 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>> 2013-12-02 19:49:28,638 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,639 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> gssapi-with-mic
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> publickey
>>> 2013-12-02 19:49:28,644 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>> port 22
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Unable to connect to hadoop3 as user hadoop
>>> com.jcraft.jsch.JSchException: Auth fail
>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>     at
>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>>> fence service by any configured method.
>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>> SERVICE_NOT_RESPONDING
>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Exception handling the winning of election
>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>> 10.7.23.124:8020
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Trying to re-establish ZK session
>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x2429313c808025b closed
>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>> sessionTimeout=5000
>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>> to authenticate using SASL (Unable to locate a login configuration)
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>> 0x3429312ba330262, negotiated timeout = 5000
>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Session connected.
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>> marking that fencing is necessary
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Yielding from election
>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x3429312ba330262 closed
>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>>
>>
>>
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Are you able to connect both NN hosts using SSH without password?
Make sure you have correct ssh keys in authorized key file.

Regards
Jitendra


On Mon, Dec 2, 2013 at 5:50 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi Pavan
>
>
>   I'm using sshfence
>
> ------core-site.xml-----------------
>
> <configuration>
>  <property>
>      <name>fs.defaultFS</name>
>      <value>hdfs://lklcluster</value>
>      <final>true</final>
>  </property>
>
>  <property>
>      <name>hadoop.tmp.dir</name>
>      <value>/home/hadoop/tmp2</value>
>  </property>
>
>
> </configuration>
>
>
> -------hdfs-site.xml-------------
>
> <configuration>
>  <property>
>      <name>dfs.namenode.name.dir</name>
>     <value>/home/hadoop/namedir2</value>
>  </property>
>
>  <property>
>      <name>dfs.datanode.data.dir</name>
>      <value>/home/hadoop/datadir2</value>
>  </property>
>
>  <property>
>    <name>dfs.nameservices</name>
>    <value>lklcluster</value>
> </property>
>
> <property>
>     <name>dfs.ha.namenodes.lklcluster</name>
>     <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
>   <value>hadoop2:8020</value>
> </property>
> <property>
>     <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
>     <value>hadoop3:8020</value>
> </property>
>
> <property>
>   <name>dfs.namenode.http-address.lklcluster.nn1</name>
>     <value>hadoop2:50070</value>
> </property>
>
> <property>
>     <name>dfs.namenode.http-address.lklcluster.nn2</name>
>     <value>hadoop3:50070</value>
> </property>
>
> <property>
>   <name>dfs.namenode.shared.edits.dir</name>
>
> <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
> </property>
> <property>
>   <name>dfs.client.failover.proxy.provider.lklcluster</name>
>
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.ha.fencing.methods</name>
>   <value>sshfence</value>
> </property>
>
> <property>
>   <name>dfs.ha.fencing.ssh.private-key-files</name>
>    <value>/home/hadoop/.ssh/id_rsa</value>
> </property>
>
> <property>
>     <name>dfs.ha.fencing.ssh.connect-timeout</name>
>      <value>5000</value>
> </property>
>
> <property>
>   <name>dfs.journalnode.edits.dir</name>
>    <value>/home/hadoop/journal/data</value>
> </property>
>
> <property>
>    <name>dfs.ha.automatic-failover.enabled</name>
>       <value>true</value>
> </property>
>
> <property>
>      <name>ha.zookeeper.quorum</name>
>      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
> </property>
>
> </configuration>
>
>
> 2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>
>
>> Post your config files and in which method you are following for
>> automatic failover
>>
>>
>> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>>
>>> Hi i
>>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>>
>>>   The cluster can be manully failover ,however failed with the automatic
>>> failover.
>>> I setup the HA according to  the URL
>>>
>>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>>
>>>   When I test the automatic failover, I killed my active NN by kill -9
>>> <Pid-nn>,while the standby namenode does not change to active state.
>>>   It came out the log in my DFSZKFailoverController as [1]
>>>
>>>  Please help me ,any suggestion will be appreciated.
>>>
>>>
>>> Regards.
>>>
>>>
>>> zkfc
>>> log[1]----------------------------------------------------------------------------------------------------
>>>
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>>> Beginning Service Fencing Process... ======
>>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Connecting to hadoop3...
>>> 2013-12-02 19:49:28,590 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connecting to hadoop3 port 22
>>> 2013-12-02 19:49:28,592 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Connection established
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Remote version string:
>>> SSH-2.0-OpenSSH_5.3
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Local version string:
>>> SSH-2.0-JSCH-0.1.42
>>> 2013-12-02 19:49:28,603 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: CheckCiphers:
>>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-ctr is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes256-cbc is not available.
>>> 2013-12-02 19:49:28,608 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: aes192-cbc is not available.
>>> 2013-12-02 19:49:28,609 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: arcfour256 is not available.
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT sent
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXINIT received
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: server->client aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,610 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: kex: client->server aes128-ctr
>>> hmac-md5 none
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_KEXDH_INIT sent
>>> 2013-12-02 19:49:28,617 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: expecting SSH_MSG_KEXDH_REPLY
>>> 2013-12-02 19:49:28,634 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: ssh_rsa_verify: signature true
>>> 2013-12-02 19:49:28,635 WARN
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Permanently added 'hadoop3'
>>> (RSA) to the list of known hosts.
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS sent
>>> 2013-12-02 19:49:28,635 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_NEWKEYS received
>>> 2013-12-02 19:49:28,636 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_REQUEST sent
>>> 2013-12-02 19:49:28,637 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: SSH_MSG_SERVICE_ACCEPT received
>>> 2013-12-02 19:49:28,638 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: gssapi-with-mic,publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,639 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> gssapi-with-mic
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Authentications that can
>>> continue: publickey,keyboard-interactive,password
>>> 2013-12-02 19:49:28,642 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Next authentication method:
>>> publickey
>>> 2013-12-02 19:49:28,644 INFO
>>> org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Disconnecting from hadoop3
>>> port 22
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>>> Unable to connect to hadoop3 as user hadoop
>>> com.jcraft.jsch.JSchException: Auth fail
>>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>>     at
>>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>>> fence service by any configured method.
>>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>>> SERVICE_NOT_RESPONDING
>>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Exception handling the winning of election
>>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>>> 10.7.23.124:8020
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>>     at
>>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>>     at
>>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>>     at
>>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Trying to re-establish ZK session
>>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x2429313c808025b closed
>>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>>> sessionTimeout=5000
>>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>>> to authenticate using SASL (Unable to locate a login configuration)
>>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to hadoop3/10.7.23.124:2181, initiating session
>>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>>> 0x3429312ba330262, negotiated timeout = 5000
>>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Session connected.
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>>> marking that fencing is necessary
>>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>>> Yielding from election
>>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>>> 0x3429312ba330262 closed
>>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn:
>>> EventThread shut down
>>>
>>
>>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Pavan


  I'm using sshfence

------core-site.xml-----------------

<configuration>
 <property>
     <name>fs.defaultFS</name>
     <value>hdfs://lklcluster</value>
     <final>true</final>
 </property>

 <property>
     <name>hadoop.tmp.dir</name>
     <value>/home/hadoop/tmp2</value>
 </property>


</configuration>


-------hdfs-site.xml-------------

<configuration>
 <property>
     <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/namedir2</value>
 </property>

 <property>
     <name>dfs.datanode.data.dir</name>
     <value>/home/hadoop/datadir2</value>
 </property>

 <property>
   <name>dfs.nameservices</name>
   <value>lklcluster</value>
</property>

<property>
    <name>dfs.ha.namenodes.lklcluster</name>
    <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
  <value>hadoop2:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
    <value>hadoop3:8020</value>
</property>

<property>
  <name>dfs.namenode.http-address.lklcluster.nn1</name>
    <value>hadoop2:50070</value>
</property>

<property>
    <name>dfs.namenode.http-address.lklcluster.nn2</name>
    <value>hadoop3:50070</value>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.lklcluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
   <value>/home/hadoop/.ssh/id_rsa</value>
</property>

<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
     <value>5000</value>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
   <value>/home/hadoop/journal/data</value>
</property>

<property>
   <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
</property>

<property>
     <name>ha.zookeeper.quorum</name>
     <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>

</configuration>


2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>

> Post your config files and in which method you are following for automatic
> failover
>
>
> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi i
>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>
>>   The cluster can be manully failover ,however failed with the automatic
>> failover.
>> I setup the HA according to  the URL
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>
>>   When I test the automatic failover, I killed my active NN by kill -9
>> <Pid-nn>,while the standby namenode does not change to active state.
>>   It came out the log in my DFSZKFailoverController as [1]
>>
>>  Please help me ,any suggestion will be appreciated.
>>
>>
>> Regards.
>>
>>
>> zkfc
>> log[1]----------------------------------------------------------------------------------------------------
>>
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>> Beginning Service Fencing Process... ======
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>> Connecting to hadoop3...
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connecting to hadoop3 port 22
>> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connection established
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Remote version string: SSH-2.0-OpenSSH_5.3
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Local version string: SSH-2.0-JSCH-0.1.42
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> CheckCiphers:
>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-cbc is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-cbc is not available.
>> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> arcfour256 is not available.
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT sent
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT received
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: server->client aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: client->server aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXDH_INIT sent
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> expecting SSH_MSG_KEXDH_REPLY
>> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> ssh_rsa_verify: signature true
>> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Permanently added 'hadoop3' (RSA) to the list of known hosts.
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS sent
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS received
>> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_REQUEST sent
>> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_ACCEPT received
>> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue:
>> gssapi-with-mic,publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: gssapi-with-mic
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue: publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: publickey
>> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Disconnecting from hadoop3 port 22
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>> Unable to connect to hadoop3 as user hadoop
>> com.jcraft.jsch.JSchException: Auth fail
>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>     at
>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>> fence service by any configured method.
>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>> SERVICE_NOT_RESPONDING
>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Exception handling the winning of election
>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>> 10.7.23.124:8020
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Trying to re-establish ZK session
>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x2429313c808025b closed
>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>> sessionTimeout=5000
>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>> to authenticate using SASL (Unable to locate a login configuration)
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to hadoop3/10.7.23.124:2181, initiating session
>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>> 0x3429312ba330262, negotiated timeout = 5000
>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Session connected.
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>> marking that fencing is necessary
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Yielding from election
>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x3429312ba330262 closed
>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>>
>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Pavan


  I'm using sshfence

------core-site.xml-----------------

<configuration>
 <property>
     <name>fs.defaultFS</name>
     <value>hdfs://lklcluster</value>
     <final>true</final>
 </property>

 <property>
     <name>hadoop.tmp.dir</name>
     <value>/home/hadoop/tmp2</value>
 </property>


</configuration>


-------hdfs-site.xml-------------

<configuration>
 <property>
     <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/namedir2</value>
 </property>

 <property>
     <name>dfs.datanode.data.dir</name>
     <value>/home/hadoop/datadir2</value>
 </property>

 <property>
   <name>dfs.nameservices</name>
   <value>lklcluster</value>
</property>

<property>
    <name>dfs.ha.namenodes.lklcluster</name>
    <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
  <value>hadoop2:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
    <value>hadoop3:8020</value>
</property>

<property>
  <name>dfs.namenode.http-address.lklcluster.nn1</name>
    <value>hadoop2:50070</value>
</property>

<property>
    <name>dfs.namenode.http-address.lklcluster.nn2</name>
    <value>hadoop3:50070</value>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.lklcluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
   <value>/home/hadoop/.ssh/id_rsa</value>
</property>

<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
     <value>5000</value>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
   <value>/home/hadoop/journal/data</value>
</property>

<property>
   <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
</property>

<property>
     <name>ha.zookeeper.quorum</name>
     <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>

</configuration>


2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>

> Post your config files and in which method you are following for automatic
> failover
>
>
> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi i
>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>
>>   The cluster can be manully failover ,however failed with the automatic
>> failover.
>> I setup the HA according to  the URL
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>
>>   When I test the automatic failover, I killed my active NN by kill -9
>> <Pid-nn>,while the standby namenode does not change to active state.
>>   It came out the log in my DFSZKFailoverController as [1]
>>
>>  Please help me ,any suggestion will be appreciated.
>>
>>
>> Regards.
>>
>>
>> zkfc
>> log[1]----------------------------------------------------------------------------------------------------
>>
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>> Beginning Service Fencing Process... ======
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>> Connecting to hadoop3...
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connecting to hadoop3 port 22
>> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connection established
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Remote version string: SSH-2.0-OpenSSH_5.3
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Local version string: SSH-2.0-JSCH-0.1.42
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> CheckCiphers:
>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-cbc is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-cbc is not available.
>> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> arcfour256 is not available.
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT sent
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT received
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: server->client aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: client->server aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXDH_INIT sent
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> expecting SSH_MSG_KEXDH_REPLY
>> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> ssh_rsa_verify: signature true
>> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Permanently added 'hadoop3' (RSA) to the list of known hosts.
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS sent
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS received
>> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_REQUEST sent
>> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_ACCEPT received
>> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue:
>> gssapi-with-mic,publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: gssapi-with-mic
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue: publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: publickey
>> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Disconnecting from hadoop3 port 22
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>> Unable to connect to hadoop3 as user hadoop
>> com.jcraft.jsch.JSchException: Auth fail
>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>     at
>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>> fence service by any configured method.
>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>> SERVICE_NOT_RESPONDING
>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Exception handling the winning of election
>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>> 10.7.23.124:8020
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Trying to re-establish ZK session
>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x2429313c808025b closed
>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>> sessionTimeout=5000
>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>> to authenticate using SASL (Unable to locate a login configuration)
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to hadoop3/10.7.23.124:2181, initiating session
>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>> 0x3429312ba330262, negotiated timeout = 5000
>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Session connected.
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>> marking that fencing is necessary
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Yielding from election
>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x3429312ba330262 closed
>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>>
>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Pavan


  I'm using sshfence

------core-site.xml-----------------

<configuration>
 <property>
     <name>fs.defaultFS</name>
     <value>hdfs://lklcluster</value>
     <final>true</final>
 </property>

 <property>
     <name>hadoop.tmp.dir</name>
     <value>/home/hadoop/tmp2</value>
 </property>


</configuration>


-------hdfs-site.xml-------------

<configuration>
 <property>
     <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/namedir2</value>
 </property>

 <property>
     <name>dfs.datanode.data.dir</name>
     <value>/home/hadoop/datadir2</value>
 </property>

 <property>
   <name>dfs.nameservices</name>
   <value>lklcluster</value>
</property>

<property>
    <name>dfs.ha.namenodes.lklcluster</name>
    <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
  <value>hadoop2:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
    <value>hadoop3:8020</value>
</property>

<property>
  <name>dfs.namenode.http-address.lklcluster.nn1</name>
    <value>hadoop2:50070</value>
</property>

<property>
    <name>dfs.namenode.http-address.lklcluster.nn2</name>
    <value>hadoop3:50070</value>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.lklcluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
   <value>/home/hadoop/.ssh/id_rsa</value>
</property>

<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
     <value>5000</value>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
   <value>/home/hadoop/journal/data</value>
</property>

<property>
   <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
</property>

<property>
     <name>ha.zookeeper.quorum</name>
     <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>

</configuration>


2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>

> Post your config files and in which method you are following for automatic
> failover
>
>
> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi i
>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>
>>   The cluster can be manully failover ,however failed with the automatic
>> failover.
>> I setup the HA according to  the URL
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>
>>   When I test the automatic failover, I killed my active NN by kill -9
>> <Pid-nn>,while the standby namenode does not change to active state.
>>   It came out the log in my DFSZKFailoverController as [1]
>>
>>  Please help me ,any suggestion will be appreciated.
>>
>>
>> Regards.
>>
>>
>> zkfc
>> log[1]----------------------------------------------------------------------------------------------------
>>
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>> Beginning Service Fencing Process... ======
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>> Connecting to hadoop3...
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connecting to hadoop3 port 22
>> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connection established
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Remote version string: SSH-2.0-OpenSSH_5.3
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Local version string: SSH-2.0-JSCH-0.1.42
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> CheckCiphers:
>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-cbc is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-cbc is not available.
>> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> arcfour256 is not available.
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT sent
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT received
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: server->client aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: client->server aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXDH_INIT sent
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> expecting SSH_MSG_KEXDH_REPLY
>> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> ssh_rsa_verify: signature true
>> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Permanently added 'hadoop3' (RSA) to the list of known hosts.
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS sent
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS received
>> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_REQUEST sent
>> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_ACCEPT received
>> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue:
>> gssapi-with-mic,publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: gssapi-with-mic
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue: publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: publickey
>> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Disconnecting from hadoop3 port 22
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>> Unable to connect to hadoop3 as user hadoop
>> com.jcraft.jsch.JSchException: Auth fail
>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>     at
>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>> fence service by any configured method.
>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>> SERVICE_NOT_RESPONDING
>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Exception handling the winning of election
>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>> 10.7.23.124:8020
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Trying to re-establish ZK session
>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x2429313c808025b closed
>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>> sessionTimeout=5000
>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>> to authenticate using SASL (Unable to locate a login configuration)
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to hadoop3/10.7.23.124:2181, initiating session
>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>> 0x3429312ba330262, negotiated timeout = 5000
>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Session connected.
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>> marking that fencing is necessary
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Yielding from election
>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x3429312ba330262 closed
>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>>
>
>

Re: auto-failover does not work

Posted by YouPeng Yang <yy...@gmail.com>.
Hi Pavan


  I'm using sshfence

------core-site.xml-----------------

<configuration>
 <property>
     <name>fs.defaultFS</name>
     <value>hdfs://lklcluster</value>
     <final>true</final>
 </property>

 <property>
     <name>hadoop.tmp.dir</name>
     <value>/home/hadoop/tmp2</value>
 </property>


</configuration>


-------hdfs-site.xml-------------

<configuration>
 <property>
     <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/namedir2</value>
 </property>

 <property>
     <name>dfs.datanode.data.dir</name>
     <value>/home/hadoop/datadir2</value>
 </property>

 <property>
   <name>dfs.nameservices</name>
   <value>lklcluster</value>
</property>

<property>
    <name>dfs.ha.namenodes.lklcluster</name>
    <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.lklcluster.nn1</name>
  <value>hadoop2:8020</value>
</property>
<property>
    <name>dfs.namenode.rpc-address.lklcluster.nn2</name>
    <value>hadoop3:8020</value>
</property>

<property>
  <name>dfs.namenode.http-address.lklcluster.nn1</name>
    <value>hadoop2:50070</value>
</property>

<property>
    <name>dfs.namenode.http-address.lklcluster.nn2</name>
    <value>hadoop3:50070</value>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/lklcluster</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.lklcluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
   <value>/home/hadoop/.ssh/id_rsa</value>
</property>

<property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
     <value>5000</value>
</property>

<property>
  <name>dfs.journalnode.edits.dir</name>
   <value>/home/hadoop/journal/data</value>
</property>

<property>
   <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
</property>

<property>
     <name>ha.zookeeper.quorum</name>
     <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
</property>

</configuration>


2013/12/2 Pavan Kumar Polineni <sm...@gmail.com>

> Post your config files and in which method you are following for automatic
> failover
>
>
> On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:
>
>> Hi i
>>   I'm testing the HA auto-failover within hadoop-2.2.0
>>
>>   The cluster can be manully failover ,however failed with the automatic
>> failover.
>> I setup the HA according to  the URL
>>
>> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>>
>>   When I test the automatic failover, I killed my active NN by kill -9
>> <Pid-nn>,while the standby namenode does not change to active state.
>>   It came out the log in my DFSZKFailoverController as [1]
>>
>>  Please help me ,any suggestion will be appreciated.
>>
>>
>> Regards.
>>
>>
>> zkfc
>> log[1]----------------------------------------------------------------------------------------------------
>>
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
>> Beginning Service Fencing Process... ======
>> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
>> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
>> Connecting to hadoop3...
>> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connecting to hadoop3 port 22
>> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Connection established
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Remote version string: SSH-2.0-OpenSSH_5.3
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Local version string: SSH-2.0-JSCH-0.1.42
>> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> CheckCiphers:
>> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-ctr is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes256-cbc is not available.
>> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> aes192-cbc is not available.
>> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> arcfour256 is not available.
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT sent
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXINIT received
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: server->client aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> kex: client->server aes128-ctr hmac-md5 none
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_KEXDH_INIT sent
>> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> expecting SSH_MSG_KEXDH_REPLY
>> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> ssh_rsa_verify: signature true
>> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Permanently added 'hadoop3' (RSA) to the list of known hosts.
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS sent
>> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_NEWKEYS received
>> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_REQUEST sent
>> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> SSH_MSG_SERVICE_ACCEPT received
>> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue:
>> gssapi-with-mic,publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: gssapi-with-mic
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Authentications that can continue: publickey,keyboard-interactive,password
>> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Next authentication method: publickey
>> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
>> Disconnecting from hadoop3 port 22
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
>> Unable to connect to hadoop3 as user hadoop
>> com.jcraft.jsch.JSchException: Auth fail
>>     at com.jcraft.jsch.Session.connect(Session.java:452)
>>     at
>> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
>> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
>> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
>> fence service by any configured method.
>> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
>> SERVICE_NOT_RESPONDING
>> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Exception handling the winning of election
>> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
>> 10.7.23.124:8020
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>>     at
>> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>>     at
>> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>>     at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Trying to re-establish ZK session
>> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x2429313c808025b closed
>> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
>> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
>> sessionTimeout=5000
>> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt
>> to authenticate using SASL (Unable to locate a login configuration)
>> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
>> connection established to hadoop3/10.7.23.124:2181, initiating session
>> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
>> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
>> 0x3429312ba330262, negotiated timeout = 5000
>> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Session connected.
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
>> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
>> marking that fencing is necessary
>> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
>> Yielding from election
>> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
>> 0x3429312ba330262 closed
>> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
>> Ignoring stale result from old client with sessionId 0x3429312ba330262
>> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
>> shut down
>>
>
>

Re: auto-failover does not work

Posted by Pavan Kumar Polineni <sm...@gmail.com>.
Post your config files and in which method you are following for automatic
failover


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Re: auto-failover does not work

Posted by Pavan Kumar Polineni <sm...@gmail.com>.
Post your config files and in which method you are following for automatic
failover


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Which fencing method you are using in you configuration? Do you have
correct ssh configuration between your hosts?


Regards
Jitendra


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Re: auto-failover does not work

Posted by Pavan Kumar Polineni <sm...@gmail.com>.
Post your config files and in which method you are following for automatic
failover


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Which fencing method you are using in you configuration? Do you have
correct ssh configuration between your hosts?


Regards
Jitendra


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Re: auto-failover does not work

Posted by Pavan Kumar Polineni <sm...@gmail.com>.
Post your config files and in which method you are following for automatic
failover


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Which fencing method you are using in you configuration? Do you have
correct ssh configuration between your hosts?


Regards
Jitendra


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>

Re: auto-failover does not work

Posted by Jitendra Yadav <je...@gmail.com>.
Which fencing method you are using in you configuration? Do you have
correct ssh configuration between your hosts?


Regards
Jitendra


On Mon, Dec 2, 2013 at 5:34 PM, YouPeng Yang <yy...@gmail.com>wrote:

> Hi i
>   I'm testing the HA auto-failover within hadoop-2.2.0
>
>   The cluster can be manully failover ,however failed with the automatic
> failover.
> I setup the HA according to  the URL
>
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
>
>   When I test the automatic failover, I killed my active NN by kill -9
> <Pid-nn>,while the standby namenode does not change to active state.
>   It came out the log in my DFSZKFailoverController as [1]
>
>  Please help me ,any suggestion will be appreciated.
>
>
> Regards.
>
>
> zkfc
> log[1]----------------------------------------------------------------------------------------------------
>
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: ======
> Beginning Service Fencing Process... ======
> 2013-12-02 19:49:28,588 INFO org.apache.hadoop.ha.NodeFencer: Trying
> method 1/1: org.apache.hadoop.ha.SshFenceByTcpPort(null)
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort:
> Connecting to hadoop3...
> 2013-12-02 19:49:28,590 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connecting to hadoop3 port 22
> 2013-12-02 19:49:28,592 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Connection established
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Remote version string: SSH-2.0-OpenSSH_5.3
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Local version string: SSH-2.0-JSCH-0.1.42
> 2013-12-02 19:49:28,603 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> CheckCiphers:
> aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-ctr is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes256-cbc is not available.
> 2013-12-02 19:49:28,608 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> aes192-cbc is not available.
> 2013-12-02 19:49:28,609 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> arcfour256 is not available.
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT sent
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXINIT received
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: server->client aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,610 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> kex: client->server aes128-ctr hmac-md5 none
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_KEXDH_INIT sent
> 2013-12-02 19:49:28,617 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> expecting SSH_MSG_KEXDH_REPLY
> 2013-12-02 19:49:28,634 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> ssh_rsa_verify: signature true
> 2013-12-02 19:49:28,635 WARN org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Permanently added 'hadoop3' (RSA) to the list of known hosts.
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS sent
> 2013-12-02 19:49:28,635 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_NEWKEYS received
> 2013-12-02 19:49:28,636 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_REQUEST sent
> 2013-12-02 19:49:28,637 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> SSH_MSG_SERVICE_ACCEPT received
> 2013-12-02 19:49:28,638 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue:
> gssapi-with-mic,publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,639 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: gssapi-with-mic
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Authentications that can continue: publickey,keyboard-interactive,password
> 2013-12-02 19:49:28,642 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Next authentication method: publickey
> 2013-12-02 19:49:28,644 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch:
> Disconnecting from hadoop3 port 22
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.SshFenceByTcpPort:
> Unable to connect to hadoop3 as user hadoop
> com.jcraft.jsch.JSchException: Auth fail
>     at com.jcraft.jsch.Session.connect(Session.java:452)
>     at
> org.apache.hadoop.ha.SshFenceByTcpPort.tryFence(SshFenceByTcpPort.java:100)
>     at org.apache.hadoop.ha.NodeFencer.fence(NodeFencer.java:97)
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:521)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,644 WARN org.apache.hadoop.ha.NodeFencer: Fencing
> method org.apache.hadoop.ha.SshFenceByTcpPort(null) was unsuccessful.
> 2013-12-02 19:49:28,645 ERROR org.apache.hadoop.ha.NodeFencer: Unable to
> fence service by any configured method.
> 2013-12-02 19:49:28,645 INFO org.apache.hadoop.ha.ZKFailoverController:
> Local service NameNode at hadoop2/10.7.23.125:8020 entered state:
> SERVICE_NOT_RESPONDING
> 2013-12-02 19:49:28,646 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Exception handling the winning of election
> java.lang.RuntimeException: Unable to fence NameNode at hadoop3/
> 10.7.23.124:8020
>     at
> org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:522)
>     at
> org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:494)
>     at
> org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:59)
>     at
> org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:837)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:900)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:799)
>     at
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
>     at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596)
>     at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> 2013-12-02 19:49:28,646 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Trying to re-establish ZK session
> 2013-12-02 19:49:28,669 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x2429313c808025b closed
> 2013-12-02 19:49:29,672 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=hadoop1:2181,hadoop2:2181,hadoop3:2181
> sessionTimeout=5000
> watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@3545fe3b
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server hadoop3/10.7.23.124:2181. Will not attempt to
> authenticate using SASL (Unable to locate a login configuration)
> 2013-12-02 19:49:29,675 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to hadoop3/10.7.23.124:2181, initiating session
> 2013-12-02 19:49:29,699 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server hadoop3/10.7.23.124:2181, sessionid =
> 0x3429312ba330262, negotiated timeout = 5000
> 2013-12-02 19:49:29,702 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session connected.
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ZKFailoverController:
> Quitting master election for NameNode at hadoop2/10.7.23.125:8020 and
> marking that fencing is necessary
> 2013-12-02 19:49:29,706 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Yielding from election
> 2013-12-02 19:49:29,727 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x3429312ba330262 closed
> 2013-12-02 19:49:29,728 WARN org.apache.hadoop.ha.ActiveStandbyElector:
> Ignoring stale result from old client with sessionId 0x3429312ba330262
> 2013-12-02 19:49:29,728 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
>