You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Flavio Junqueira <fp...@yahoo-inc.com> on 2009/11/06 16:10:30 UTC

Re: Possible race in LETest.java

Hi Henry, Apologies for the the delay. Your observation sounds right  
to me. Here is how I'm reading it; let me know if it makes sense.

If everyone votes for 3 in the second round and 3 has crashed, then in  
countVotes we will remove all votes to 3 and there will be no vote  
left. In such a case, there will be no winner as a result of the call  
to countVotes and lookForLeader won't change the current vote  
(LeaderElection.java:201). This is a situation in which we are stuck.

Does it sound reasonable to add an "else" to the "if" statement of  
LeaderElection.java:201 to reset the vote? This modification would  
implementing resetting the vote when countVotes returns no winner,  
which should happen only when the replica itself votes for a dead  
leader.

-Flavio

On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote:

> [ Sending this direct since the Apache mailserver is rejecting my e- 
> mails at the moment ]
>
> As I understand it, 1 and 2 receive a vote for 3 in the first round,  
> which causes them to vote for 3 in the second round. So in the  
> second round, all votes cast are for 3. But 3 has died, so all votes  
> for it are discounted. 1 and 2 continue to vote for 3 ad infinitum,  
> never resetting their vote.
>
> Does this sound plausible, or am I missing something?
>
> cheers,
> Henry
>
> On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <fpj@yahoo- 
> inc.com> wrote:
> Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in  
> your situation. If they exclude 3 in countVotes, then countVotes  
> will end up returning 2 and not 3, assuming there is a vote for 2.  
> What am I missing?
>
> The problem with QuorumPeer you're pointing at was also an issue  
> with the FLE tests, and I couldn't see an easy way around it other  
> than timing out and restarting leader election.
>
> Cheers,
> -Flavio
>
>
> On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:
>
> I've been working on adding a TCPResponderThread to the leader  
> election
> process so that if a deployment needs to be TCP only, it can be and  
> still
> use all election types. Testing this has exposed what might be a race
> condition in the leader election code that prevents a leader from  
> being
> elected.
>
> Here's the behaviour I see in LETest occasionally. With three nodes  
> (reduced
> from 30 for ease of debugging), node 3 gets elected before either  
> node 1 or
> node 2 finish their election (there is one round where each node  
> that 3 has
> the highest id, and then 3 completes its second round by receiving  
> votes for
> itself from 1 and 2, but 1 and 2 do not receive votes from 3).
>
> Now 3 is killed by the test harness. 1 and 2 are still voting for  
> it, but
> every time they try, the vote tally excludes 3 since it hasn't been  
> heard
> from. They then spin round the voting process, unable to reset their  
> vote. I
> expect that the heartbeat mechanism in a running QuorumPeer takes  
> care of
> this when the leader is lost, but the associated QuorumPeers aren't  
> running.
>
> If this is the case, then there is a simple fix to reset the nodes  
> vote to
> themselves if they are voting for a node that hasn't been heard  
> from. I
> don't know why using TCP instead of UDP for the responder thread is
> exacerbating this (and we can't rule out my introducing a bug :));  
> but as
> it's a race condition the different timings associated with waiting  
> on a TCP
> socket might just be enough to expose the issue.
>
> Can someone verify this might be possible / figure out what I missed?
>
> cheers,
> Henry
>
>


Re: Possible race in LETest.java

Posted by Patrick Hunt <ph...@apache.org>.
Ok, great. Good to see you guys are on top of this one!

Patrick

Henry Robinson wrote:
> The patch is simple, but a test is harder because it's a race condition. I'm
> working on it.
> 
> Henry
> 
> On Wed, Nov 11, 2009 at 12:06 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:
> 
>> Henry already opened one: ZOOKEEPER-569.
>>
>> -Flavio
>>
>>
>> On Nov 11, 2009, at 7:03 AM, Patrick Hunt wrote:
>>
>>  Closing the loop - what's the status on this? Can one of you open a
>>> JIRA and provide a patch for this?
>>>
>>> Thanks,
>>>
>>> Patrick
>>>
>>> Flavio Junqueira wrote:
>>>
>>>> Hi Henry, Apologies for the the delay. Your observation sounds right to
>>>> me. Here is how I'm reading it; let me know if it makes sense.
>>>>
>>>> If everyone votes for 3 in the second round and 3 has crashed, then in
>>>> countVotes we will remove all votes to 3 and there will be no vote left.
>>>> In such a case, there will be no winner as a result of the call to
>>>> countVotes and lookForLeader won't change the current vote
>>>> (LeaderElection.java:201). This is a situation in which we are stuck.
>>>>
>>>> Does it sound reasonable to add an "else" to the "if" statement of
>>>> LeaderElection.java:201 to reset the vote? This modification would
>>>> implementing resetting the vote when countVotes returns no winner, which
>>>> should happen only when the replica itself votes for a dead leader.
>>>>
>>>> -Flavio
>>>>
>>>> On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote:
>>>>
>>>>  [ Sending this direct since the Apache mailserver is rejecting my
>>>>> e-mails at the moment ]
>>>>>
>>>>> As I understand it, 1 and 2 receive a vote for 3 in the first round,
>>>>> which causes them to vote for 3 in the second round. So in the second
>>>>> round, all votes cast are for 3. But 3 has died, so all votes for it
>>>>> are discounted. 1 and 2 continue to vote for 3 ad infinitum, never
>>>>> resetting their vote.
>>>>>
>>>>> Does this sound plausible, or am I missing something?
>>>>>
>>>>> cheers,
>>>>> Henry
>>>>>
>>>>> On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <fp...@yahoo-inc.com>
>>>>> wrote:
>>>>> Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in
>>>>> your situation. If they exclude 3 in countVotes, then countVotes will
>>>>> end up returning 2 and not 3, assuming there is a vote for 2. What am
>>>>> I missing?
>>>>>
>>>>> The problem with QuorumPeer you're pointing at was also an issue with
>>>>> the FLE tests, and I couldn't see an easy way around it other than
>>>>> timing out and restarting leader election.
>>>>>
>>>>> Cheers,
>>>>> -Flavio
>>>>>
>>>>>
>>>>> On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:
>>>>>
>>>>> I've been working on adding a TCPResponderThread to the leader election
>>>>> process so that if a deployment needs to be TCP only, it can be and
>>>>> still
>>>>> use all election types. Testing this has exposed what might be a race
>>>>> condition in the leader election code that prevents a leader from being
>>>>> elected.
>>>>>
>>>>> Here's the behaviour I see in LETest occasionally. With three nodes
>>>>> (reduced
>>>>> from 30 for ease of debugging), node 3 gets elected before either node
>>>>> 1 or
>>>>> node 2 finish their election (there is one round where each node that
>>>>> 3 has
>>>>> the highest id, and then 3 completes its second round by receiving
>>>>> votes for
>>>>> itself from 1 and 2, but 1 and 2 do not receive votes from 3).
>>>>>
>>>>> Now 3 is killed by the test harness. 1 and 2 are still voting for it,
>>>>> but
>>>>> every time they try, the vote tally excludes 3 since it hasn't been
>>>>> heard
>>>>> from. They then spin round the voting process, unable to reset their
>>>>> vote. I
>>>>> expect that the heartbeat mechanism in a running QuorumPeer takes care
>>>>> of
>>>>> this when the leader is lost, but the associated QuorumPeers aren't
>>>>> running.
>>>>>
>>>>> If this is the case, then there is a simple fix to reset the nodes
>>>>> vote to
>>>>> themselves if they are voting for a node that hasn't been heard from. I
>>>>> don't know why using TCP instead of UDP for the responder thread is
>>>>> exacerbating this (and we can't rule out my introducing a bug :)); but
>>>>> as
>>>>> it's a race condition the different timings associated with waiting on
>>>>> a TCP
>>>>> socket might just be enough to expose the issue.
>>>>>
>>>>> Can someone verify this might be possible / figure out what I missed?
>>>>>
>>>>> cheers,
>>>>> Henry
>>>>>
>>>>>
>>>>>
>>>>
> 

Re: Possible race in LETest.java

Posted by Henry Robinson <he...@cloudera.com>.
The patch is simple, but a test is harder because it's a race condition. I'm
working on it.

Henry

On Wed, Nov 11, 2009 at 12:06 AM, Flavio Junqueira <fp...@yahoo-inc.com>wrote:

> Henry already opened one: ZOOKEEPER-569.
>
> -Flavio
>
>
> On Nov 11, 2009, at 7:03 AM, Patrick Hunt wrote:
>
>  Closing the loop - what's the status on this? Can one of you open a
>> JIRA and provide a patch for this?
>>
>> Thanks,
>>
>> Patrick
>>
>> Flavio Junqueira wrote:
>>
>>> Hi Henry, Apologies for the the delay. Your observation sounds right to
>>> me. Here is how I'm reading it; let me know if it makes sense.
>>>
>>> If everyone votes for 3 in the second round and 3 has crashed, then in
>>> countVotes we will remove all votes to 3 and there will be no vote left.
>>> In such a case, there will be no winner as a result of the call to
>>> countVotes and lookForLeader won't change the current vote
>>> (LeaderElection.java:201). This is a situation in which we are stuck.
>>>
>>> Does it sound reasonable to add an "else" to the "if" statement of
>>> LeaderElection.java:201 to reset the vote? This modification would
>>> implementing resetting the vote when countVotes returns no winner, which
>>> should happen only when the replica itself votes for a dead leader.
>>>
>>> -Flavio
>>>
>>> On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote:
>>>
>>>  [ Sending this direct since the Apache mailserver is rejecting my
>>>> e-mails at the moment ]
>>>>
>>>> As I understand it, 1 and 2 receive a vote for 3 in the first round,
>>>> which causes them to vote for 3 in the second round. So in the second
>>>> round, all votes cast are for 3. But 3 has died, so all votes for it
>>>> are discounted. 1 and 2 continue to vote for 3 ad infinitum, never
>>>> resetting their vote.
>>>>
>>>> Does this sound plausible, or am I missing something?
>>>>
>>>> cheers,
>>>> Henry
>>>>
>>>> On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <fp...@yahoo-inc.com>
>>>> wrote:
>>>> Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in
>>>> your situation. If they exclude 3 in countVotes, then countVotes will
>>>> end up returning 2 and not 3, assuming there is a vote for 2. What am
>>>> I missing?
>>>>
>>>> The problem with QuorumPeer you're pointing at was also an issue with
>>>> the FLE tests, and I couldn't see an easy way around it other than
>>>> timing out and restarting leader election.
>>>>
>>>> Cheers,
>>>> -Flavio
>>>>
>>>>
>>>> On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:
>>>>
>>>> I've been working on adding a TCPResponderThread to the leader election
>>>> process so that if a deployment needs to be TCP only, it can be and
>>>> still
>>>> use all election types. Testing this has exposed what might be a race
>>>> condition in the leader election code that prevents a leader from being
>>>> elected.
>>>>
>>>> Here's the behaviour I see in LETest occasionally. With three nodes
>>>> (reduced
>>>> from 30 for ease of debugging), node 3 gets elected before either node
>>>> 1 or
>>>> node 2 finish their election (there is one round where each node that
>>>> 3 has
>>>> the highest id, and then 3 completes its second round by receiving
>>>> votes for
>>>> itself from 1 and 2, but 1 and 2 do not receive votes from 3).
>>>>
>>>> Now 3 is killed by the test harness. 1 and 2 are still voting for it,
>>>> but
>>>> every time they try, the vote tally excludes 3 since it hasn't been
>>>> heard
>>>> from. They then spin round the voting process, unable to reset their
>>>> vote. I
>>>> expect that the heartbeat mechanism in a running QuorumPeer takes care
>>>> of
>>>> this when the leader is lost, but the associated QuorumPeers aren't
>>>> running.
>>>>
>>>> If this is the case, then there is a simple fix to reset the nodes
>>>> vote to
>>>> themselves if they are voting for a node that hasn't been heard from. I
>>>> don't know why using TCP instead of UDP for the responder thread is
>>>> exacerbating this (and we can't rule out my introducing a bug :)); but
>>>> as
>>>> it's a race condition the different timings associated with waiting on
>>>> a TCP
>>>> socket might just be enough to expose the issue.
>>>>
>>>> Can someone verify this might be possible / figure out what I missed?
>>>>
>>>> cheers,
>>>> Henry
>>>>
>>>>
>>>>
>>>
>>>
>

Re: Possible race in LETest.java

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.
Henry already opened one: ZOOKEEPER-569.

-Flavio

On Nov 11, 2009, at 7:03 AM, Patrick Hunt wrote:

> Closing the loop - what's the status on this? Can one of you open a
> JIRA and provide a patch for this?
>
> Thanks,
>
> Patrick
>
> Flavio Junqueira wrote:
>> Hi Henry, Apologies for the the delay. Your observation sounds  
>> right to
>> me. Here is how I'm reading it; let me know if it makes sense.
>>
>> If everyone votes for 3 in the second round and 3 has crashed, then  
>> in
>> countVotes we will remove all votes to 3 and there will be no vote  
>> left.
>> In such a case, there will be no winner as a result of the call to
>> countVotes and lookForLeader won't change the current vote
>> (LeaderElection.java:201). This is a situation in which we are stuck.
>>
>> Does it sound reasonable to add an "else" to the "if" statement of
>> LeaderElection.java:201 to reset the vote? This modification would
>> implementing resetting the vote when countVotes returns no winner,  
>> which
>> should happen only when the replica itself votes for a dead leader.
>>
>> -Flavio
>>
>> On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote:
>>
>>> [ Sending this direct since the Apache mailserver is rejecting my
>>> e-mails at the moment ]
>>>
>>> As I understand it, 1 and 2 receive a vote for 3 in the first round,
>>> which causes them to vote for 3 in the second round. So in the  
>>> second
>>> round, all votes cast are for 3. But 3 has died, so all votes for it
>>> are discounted. 1 and 2 continue to vote for 3 ad infinitum, never
>>> resetting their vote.
>>>
>>> Does this sound plausible, or am I missing something?
>>>
>>> cheers,
>>> Henry
>>>
>>> On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <fpj@yahoo- 
>>> inc.com>
>>> wrote:
>>> Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in
>>> your situation. If they exclude 3 in countVotes, then countVotes  
>>> will
>>> end up returning 2 and not 3, assuming there is a vote for 2. What  
>>> am
>>> I missing?
>>>
>>> The problem with QuorumPeer you're pointing at was also an issue  
>>> with
>>> the FLE tests, and I couldn't see an easy way around it other than
>>> timing out and restarting leader election.
>>>
>>> Cheers,
>>> -Flavio
>>>
>>>
>>> On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:
>>>
>>> I've been working on adding a TCPResponderThread to the leader  
>>> election
>>> process so that if a deployment needs to be TCP only, it can be  
>>> and still
>>> use all election types. Testing this has exposed what might be a  
>>> race
>>> condition in the leader election code that prevents a leader from  
>>> being
>>> elected.
>>>
>>> Here's the behaviour I see in LETest occasionally. With three nodes
>>> (reduced
>>> from 30 for ease of debugging), node 3 gets elected before either  
>>> node
>>> 1 or
>>> node 2 finish their election (there is one round where each node  
>>> that
>>> 3 has
>>> the highest id, and then 3 completes its second round by receiving
>>> votes for
>>> itself from 1 and 2, but 1 and 2 do not receive votes from 3).
>>>
>>> Now 3 is killed by the test harness. 1 and 2 are still voting for  
>>> it, but
>>> every time they try, the vote tally excludes 3 since it hasn't  
>>> been heard
>>> from. They then spin round the voting process, unable to reset their
>>> vote. I
>>> expect that the heartbeat mechanism in a running QuorumPeer takes  
>>> care of
>>> this when the leader is lost, but the associated QuorumPeers aren't
>>> running.
>>>
>>> If this is the case, then there is a simple fix to reset the nodes
>>> vote to
>>> themselves if they are voting for a node that hasn't been heard  
>>> from. I
>>> don't know why using TCP instead of UDP for the responder thread is
>>> exacerbating this (and we can't rule out my introducing a bug :));  
>>> but as
>>> it's a race condition the different timings associated with  
>>> waiting on
>>> a TCP
>>> socket might just be enough to expose the issue.
>>>
>>> Can someone verify this might be possible / figure out what I  
>>> missed?
>>>
>>> cheers,
>>> Henry
>>>
>>>
>>
>>


Re: Possible race in LETest.java

Posted by Patrick Hunt <ph...@apache.org>.
Closing the loop - what's the status on this? Can one of you open a 
JIRA and provide a patch for this?

Thanks,

Patrick

Flavio Junqueira wrote:
> Hi Henry, Apologies for the the delay. Your observation sounds right to 
> me. Here is how I'm reading it; let me know if it makes sense.
> 
> If everyone votes for 3 in the second round and 3 has crashed, then in 
> countVotes we will remove all votes to 3 and there will be no vote left. 
> In such a case, there will be no winner as a result of the call to 
> countVotes and lookForLeader won't change the current vote 
> (LeaderElection.java:201). This is a situation in which we are stuck.
> 
> Does it sound reasonable to add an "else" to the "if" statement of 
> LeaderElection.java:201 to reset the vote? This modification would 
> implementing resetting the vote when countVotes returns no winner, which 
> should happen only when the replica itself votes for a dead leader.
> 
> -Flavio
> 
> On Oct 28, 2009, at 7:44 AM, Henry Robinson wrote:
> 
>> [ Sending this direct since the Apache mailserver is rejecting my 
>> e-mails at the moment ]
>>
>> As I understand it, 1 and 2 receive a vote for 3 in the first round, 
>> which causes them to vote for 3 in the second round. So in the second 
>> round, all votes cast are for 3. But 3 has died, so all votes for it 
>> are discounted. 1 and 2 continue to vote for 3 ad infinitum, never 
>> resetting their vote.
>>
>> Does this sound plausible, or am I missing something?
>>
>> cheers,
>> Henry
>>
>> On Tue, Oct 27, 2009 at 3:48 PM, Flavio Junqueira <fp...@yahoo-inc.com> 
>> wrote:
>> Hi Henry, I don't understand how 1 and 2 do not end up electing 2 in 
>> your situation. If they exclude 3 in countVotes, then countVotes will 
>> end up returning 2 and not 3, assuming there is a vote for 2. What am 
>> I missing?
>>
>> The problem with QuorumPeer you're pointing at was also an issue with 
>> the FLE tests, and I couldn't see an easy way around it other than 
>> timing out and restarting leader election.
>>
>> Cheers,
>> -Flavio
>>
>>
>> On Oct 27, 2009, at 6:35 AM, Henry Robinson wrote:
>>
>> I've been working on adding a TCPResponderThread to the leader election
>> process so that if a deployment needs to be TCP only, it can be and still
>> use all election types. Testing this has exposed what might be a race
>> condition in the leader election code that prevents a leader from being
>> elected.
>>
>> Here's the behaviour I see in LETest occasionally. With three nodes 
>> (reduced
>> from 30 for ease of debugging), node 3 gets elected before either node 
>> 1 or
>> node 2 finish their election (there is one round where each node that 
>> 3 has
>> the highest id, and then 3 completes its second round by receiving 
>> votes for
>> itself from 1 and 2, but 1 and 2 do not receive votes from 3).
>>
>> Now 3 is killed by the test harness. 1 and 2 are still voting for it, but
>> every time they try, the vote tally excludes 3 since it hasn't been heard
>> from. They then spin round the voting process, unable to reset their 
>> vote. I
>> expect that the heartbeat mechanism in a running QuorumPeer takes care of
>> this when the leader is lost, but the associated QuorumPeers aren't 
>> running.
>>
>> If this is the case, then there is a simple fix to reset the nodes 
>> vote to
>> themselves if they are voting for a node that hasn't been heard from. I
>> don't know why using TCP instead of UDP for the responder thread is
>> exacerbating this (and we can't rule out my introducing a bug :)); but as
>> it's a race condition the different timings associated with waiting on 
>> a TCP
>> socket might just be enough to expose the issue.
>>
>> Can someone verify this might be possible / figure out what I missed?
>>
>> cheers,
>> Henry
>>
>>
> 
>