You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "Paul K. Harter, Jr." <Pa...@oracle.com> on 2014/04/22 23:49:13 UTC
Network paritions and Failover Times
I am trying to understand the mechanisms and timing involved when Hadoop is
faced
with a network partition. Suppose we have a large Hadoop cluster configured
with
automatic failover:
1) Active Name node
2) Standby NameNode
3) Quorum journal nodes (which we'll ignore for now)
4) Zookeeper ensemble with 3 nodes
Suppose the zookeeper session from the active name node happens to be direct
to the ZK leader node, and that the system experiences a network failure
resulting
in 2 partitions (A and B) with the nodes distributed as follows:
A) Zookeeper leader node;
Active NameNode
B) 2 Zookeeper followers
Standby NameNode
QUESTIONS:
Seems the result should be that both Zookeeper and the NameNode fail over to
partition B, but I wanted to confirm the sequence of actions as outlined
below.
Does this look right?
If the network failure occurs at time zero, then how long should this whole
sequence
take, if for example, syncLimit is 5 ticks and the NameNode sessionTImeout
is 10 ticks??
FAILOVER SEQUENCE (as I understand it):
1) Leader, who ends up in the minority, loses connection to remaining
servers.
2) After syncLimit, the ZK ensemble realizes there's a problem. If a
follower loses connection, then he is dropped by the leader, and
no longer participates in voting.
However, in this case the Leader no longer has quorum, so he has to
relinquish leadership. He stops responding to client requests,
enters the LOOKING state and and starts trying to form/join a quorum
(it informs the ZK client library, and) all clients are notified
with a DISCONNECTED event. (or is it that the DISCONNECTED event
delivered to the client library who delivers connection loss
exceptions to clients?)
The remaining nodes on the majority side enter leader election and
choose a new leader (which starts a new epoch) on the majority
side.
3) All clients who were connected to the (now former) leader are told to
reconnect and will either fail if they can't talk to a node on the
new majority side or will succeed in connecting with a node in the
new
quorum.
4) Meanwhile, when the Active NameNode is informed that its server has
become disconnected (DISCONNECTED event), it must stop responding
like the Active NameNode.
When the ZK quorum reforms and does not get heartbeats from the
(formerly) Active Name node, will eventually (SessionTimeout)
declare its session dead. This deletes the ephemeral node being
used to hold its lock on its status as "Active" and triggers the
Watcher for the Standby NameNode.
The Standby then attempts to compete for Active Name Node election
and should win and become the new Active.