You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ch huang <ju...@gmail.com> on 2013/11/04 02:21:49 UTC

issue about NN autofailover happened each 24 hours

hi,all:
     i have a 5node hadoop cluster ,and i deploy  HA autofailover . but i
find the failover happened each 24 hours,i do not know why


here is my NN log , zkfc log ,zookeeper log content:

*NN log:*
2013-11-03 23:28:36,740 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at
147925
2013-11-03 23:30:10,103 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services
started for active state
2013-11-03 23:30:10,103 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 147925
2013-11-03 23:30:10,104 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 1Number of transactions batched in Sync
s: 0 Number of syncs: 1 SyncTimes(ms): 11 52 55
2013-11-03 23:30:10,170 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
Total time for transactions(ms): 1Number of transactions batched in Sync
s: 0 Number of syncs: 2 SyncTimes(ms): 33 74 77
2013-11-03 23:30:10,222 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file /data/namespace/1/current/edits_inprogress_0000000000000147925 -> /
data/namespace/1/current/edits_0000000000000147925-0000000000000147926
2013-11-03 23:30:10,222 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file /data/namespace/2/current/edits_inprogress_0000000000000147925 -> /
data/namespace/2/current/edits_0000000000000147925-0000000000000147926
2013-11-03 23:30:10,222 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services
required for standby state
2013-11-03 23:30:10,224 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on
active node at ch12/192.168.11.12:8020 every 120 seconds.
2013-11-03 23:30:10,224 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting
standby checkpoint thread...
Checkpointing active NN at ch12:50070
Serving checkpoints at ch11/192.168.11.11:50070
2013-11-03 23:30:10,225 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Op
eration category WRITE is not supported in state standby
2013-11-03 23:30:10,225 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from
192.168.11.13:54864:
 error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is
not supported in state standby
2013-11-03 23:30:11,524 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Op
eration category WRITE is not supported in state standby
2013-11-03 23:30:11,524 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from
192.168.11.13:54864:
 error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is
not supported in state standby
2013-11-03 23:30:15,956 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException: Op
eration category WRITE is not supported in state standby
2013-11-03 23:30:15,956 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from
192.168.11.15:56050: error: org.apache.hadoop.ipc.StandbyException:
Operation category WRITE is not supported in state standby
2013-11-03 23:30:19,902 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException:
Operation category WRITE is not supported in state standby
2013-11-03 23:30:19,902 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from
192.168.11.12:45839: error: org.apache.hadoop.ipc.StandbyException:
Operation category WRITE is not supported in state standby
2013-11-03 23:30:21,386 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException:
Operation category READ is not supported in state standby
2013-11-03 23:30:21,387 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from
192.168.11.11:49854: error: org.apache.hadoop.ipc.StandbyException:
Operation category READ is not supported in state standby
2013-11-03 23:30:25,215 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hbase (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException:
Operation category WRITE is not supported in state standby
2013-11-03 23:30:25,215 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from
192.168.11.14:40328: error: org.apache.hadoop.ipc.StandbyException:
Operation category WRITE is not supported in state standby
2013-11-03 23:30:49,513 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:mapred (auth:SIMPLE) cause:org.apache.hadoop.ipc.StandbyException:
Operation category READ is not supported in state standby
2013-11-03 23:30:49,513 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 8020, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.getListing from
192.168.11.18:55750: error: org.apache.hadoop.ipc.StandbyException:
Operation category READ is not supported in state standby
2013-11-03 23:32:10,326 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log
roll on remote NameNode ch12/192.168.11.12:8020
2013-11-03 23:32:10,741 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@172efb72expecting
start txid #147927
2013-11-03 23:32:10,741 INFO
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream: Fast-forwarding
stream '
http://ch12:8480/getJournal?jid=product&segmentTxId=147927&storageInfo=-40%3A2086828354%3A0%3ACID-0e777b8c-19f3-44a1-8af1-916877f2506c,
http://ch11:8480/getJournal?jid=product&segmentTxId=147927&storageInfo=-40%3A2086828354%3A0%3ACID-0e777b8c-19f3-44a1-8af1-916877f2506c'to
transaction ID 147927

*ZKFC log:*
2013-11-02 23:30:09,669 INFO org.apache.zookeeper.ClientCnxn: Unable to
read additional data from server sessionid 0x41e34ab04a0192, likely server
has closed socket, closing socket connection and attempting reconnect
2013-11-02 23:30:09,771 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session disconnected. Entering neutral mode...
2013-11-02 23:30:09,982 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ch12/192.168.11.12:2181. Will not attempt to
authenticate using SASL (unknown error)
2013-11-02 23:30:09,982 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to ch12/192.168.11.12:2181, initiating session
2013-11-02 23:30:09,985 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x41e34ab04a0192 has expired,
closing socket connection
2013-11-02 23:30:09,986 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session expired. Entering neutral mode and rejoining...
2013-11-02 23:30:09,986 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Trying to re-establish ZK session
2013-11-02 23:30:09,987 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=ch11:2181,ch12:2181,ch13:2181
sessionTimeout=5000
watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@e6eb495
2013-11-02 23:30:09,988 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ch11/192.168.11.11:2181. Will not attempt to
authenticate using SASL (unknown error)
2013-11-02 23:30:09,989 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to ch11/192.168.11.11:2181, initiating session
2013-11-02 23:30:09,996 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server ch11/192.168.11.11:2181, sessionid =
0x41e34ab04a023a, negotiated timeout = 5000
2013-11-02 23:30:09,999 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2013-11-02 23:30:10,000 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session connected.
2013-11-02 23:30:10,005 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Checking for any old active which needs to be fenced...
2013-11-02 23:30:10,007 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old
node exists: 0a0770726f6475637412036e6e321a046368313220d43e28d33e
2013-11-02 23:30:10,008 INFO org.apache.hadoop.ha.ZKFailoverController:
Should fence: NameNode at ch12/192.168.11.12:8020
2013-11-02 23:30:10,110 INFO org.apache.hadoop.ha.ZKFailoverController:
Successfully transitioned NameNode at ch12/192.168.11.12:8020 to standby
state without fencing
2013-11-02 23:30:10,110 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Writing znode /hadoop-ha/product/ActiveBreadCrumb to indicate that the
local node is the most recent active...
2013-11-02 23:30:10,117 INFO org.apache.hadoop.ha.ZKFailoverController:
Trying to make NameNode at ch11/192.168.11.11:8020 active...
2013-11-02 23:30:11,736 INFO org.apache.hadoop.ha.ZKFailoverController:
Successfully transitioned NameNode at ch11/192.168.11.11:8020 to active
state
2013-11-03 23:30:09,687 INFO org.apache.zookeeper.ClientCnxn: Unable to
read additional data from server sessionid 0x41e34ab04a023a, likely server
has closed socket, closing socket connection and attempting reconnect
2013-11-03 23:30:09,789 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session disconnected. Entering neutral mode...
2013-11-03 23:30:10,695 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ch12/192.168.11.12:2181. Will not attempt to
authenticate using SASL (unknown error)
2013-11-03 23:30:10,696 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to ch12/192.168.11.12:2181, initiating session
2013-11-03 23:30:10,698 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x41e34ab04a023a has expired,
closing socket connection
2013-11-03 23:30:10,699 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session expired. Entering neutral mode and rejoining...
2013-11-03 23:30:10,699 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Trying to re-establish ZK session
2013-11-03 23:30:10,701 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=ch11:2181,ch12:2181,ch13:2181
sessionTimeout=5000
watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@52192617
2013-11-03 23:30:10,703 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server ch11/192.168.11.11:2181. Will not attempt to
authenticate using SASL (unknown error)
2013-11-03 23:30:10,703 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to ch11/192.168.11.11:2181, initiating session
2013-11-03 23:30:10,711 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server ch11/192.168.11.11:2181, sessionid =
0x41e34ab04a02d3, negotiated timeout = 5000
2013-11-03 23:30:10,713 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2013-11-03 23:30:10,715 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session connected.
2013-11-03 23:30:10,721 INFO org.apache.hadoop.ha.ZKFailoverController: ZK
Election indicated that NameNode at ch11/192.168.11.11:8020 should become
standby
2013-11-03 23:30:10,725 INFO org.apache.hadoop.ha.ZKFailoverController:
Successfully transitioned NameNode at ch11/192.168.11.11:8020 to standby
state
2013-11-03 23:30:10,799 WARN org.apache.hadoop.ha.ActiveStandbyElector:
Ignoring stale result from old client with sessionId 0x41e34ab04a023a

*zookeeper log:*
2013-11-03 23:29:41,109 [myid:0] - INFO  [Thread-33356:NIOServerCnxn@1001]
- Closed socket connection for client /192.168.11.142:54260 (no session
established for client)
2013-11-03 23:30:04,777 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.11.12:52797
2013-11-03 23:30:04,778 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew
session 0x141e46bfac70233 at /192.168.11.12:52797
2013-11-03 23:30:04,778 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x141e46bfac70233
2013-11-03 23:30:04,779 [myid:0] - INFO
[QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@595] - Established
session 0x141e46bfac70233 with negotiated timeout 5000 for client /
192.168.11.12:52797
2013-11-03 23:30:09,682 [myid:0] - INFO
[CommitProcessor:0:NIOServerCnxn@1001] - Closed socket connection for
client /192.168.11.12:52797 which had sessionid 0x141e46bfac70233
2013-11-03 23:30:09,687 [myid:0] - INFO
[CommitProcessor:0:NIOServerCnxn@1001] - Closed socket connection for
client /192.168.11.11:48112 which had sessionid 0x41e34ab04a023a
2013-11-03 23:30:10,703 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /192.168.11.11:57037
2013-11-03 23:30:10,703 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:ZooKeeperServer@839] - Client attempting to establish
new session at /192.168.11.11:57037
2013-11-03 23:30:10,711 [myid:0] - INFO
[CommitProcessor:0:ZooKeeperServer@595] - Established session
0x41e34ab04a02d3 with negotiated timeout 5000 for client /
192.168.11.11:57037
2013-11-03 23:30:18,008 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /127.0.0.1:60661
2013-11-03 23:30:18,008 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing mntr command from /
127.0.0.1:60661
2013-11-03 23:30:18,009 [myid:0] - INFO  [Thread-33357:NIOServerCnxn@1001]
- Closed socket connection for client /127.0.0.1:60661 (no session
established for client)
2013-11-03 23:30:38,008 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /127.0.0.1:60664
2013-11-03 23:30:38,008 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@821] - Processing mntr command from /
127.0.0.1:60664
2013-11-03 23:30:38,009 [myid:0] - INFO  [Thread-33358:NIOServerCnxn@1001]
- Closed socket connection for client /127.0.0.1:60664 (no session
established for client)
2013-11-03 23:31:18,008 [myid:0] - INFO  [NIOServerCxn.Factory:
0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection
from /127.0.0.1:60670