You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Vishal Kher <vi...@gmail.com> on 2011/01/07 11:17:57 UTC

Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/
-----------------------------------------------------------

Review request for zookeeper and fpj.


Summary
-------


This addresses bug ZOOKEEPER-932.
    https://issues.apache.org/jira/browse/ZOOKEEPER-932


Diffs
-----

  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
  trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 

Diff: https://reviews.apache.org/r/240/diff


Testing
-------

- ant test-core-java
- systest
- basic hand testing
- rebooted follower/leader several times


Thanks,

Vishal


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by Vishal Kher <vi...@gmail.com>.

> On 2011-01-07 06:19:28, Thomas Koch wrote:
> > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, line 127
> > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line127>
> >
> >     Nowadays we're not restricted anymore to short variable names. Just call the variable "workerThreadsCounter" and skip the comment.
> >     Most developers know, that Cnt will mean Counter, but it doesn't hurt either to just spend a few characters in plus to make it easier to read like text.
> >     It adds to the confusion that there is also a method getThreadCount in this class. So there are two possible ways to write Count.

ok


- Vishal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review88
-----------------------------------------------------------


On 2011-01-07 02:21:38, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-07 02:21:38)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by Thomas Koch <th...@koch.ro>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review88
-----------------------------------------------------------



trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
<https://reviews.apache.org/r/240/#comment162>

    Nowadays we're not restricted anymore to short variable names. Just call the variable "workerThreadsCounter" and skip the comment.
    Most developers know, that Cnt will mean Counter, but it doesn't hurt either to just spend a few characters in plus to make it easier to read like text.
    It adds to the confusion that there is also a method getThreadCount in this class. So there are two possible ways to write Count.



trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java
<https://reviews.apache.org/r/240/#comment163>

    indentation above


- Thomas


On 2011-01-07 02:21:38, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-07 02:21:38)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by Vishal Kher <vi...@gmail.com>.

> On 2011-01-09 06:48:15, fpj wrote:
> > Great job, Vishal. I have mostly minor points about your patch. Thanks for working on it!

Thanks, Flavio. I will incorporate the comments and send out a diff for review soon. Few comments below..


> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, line 336
> > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line336>
> >
> >     If this error is fatal, then I was wondering if we shouldn't abort lookForLeader() and perhaps even propagate it up so that we kill the peer. What do you think?

Looking at the code this error case should never happen. But it would be good to propogate back fatal errors. How do you propose to progate the error and kill the peer?


> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, line 896
> > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line896>
> >
> >     Shouldn't we call sw.shutdown() from finish?

shutdown sets the running flag to false and just interrupts the thread. The thread calls finish before exiting. I found it to be cleaner to call finish from one place (while exiting the thread).


> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java, line 576
> > <https://reviews.apache.org/r/240/diff/1/?file=9409#file9409line576>
> >
> >     If this is commented out, we should probably remove these lines from the patch.

Opps..


> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java, line 339
> > <https://reviews.apache.org/r/240/diff/1/?file=9409#file9409line339>
> >
> >     Isn't it better to break this one up into multiple test cases? (Same for the previous test case.)

ok


> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java, line 459
> > <https://reviews.apache.org/r/240/diff/1/?file=9409#file9409line459>
> >
> >     You may consider Assert.assertTrue(msg, cnt == ecnt).

ok


- Vishal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review95
-----------------------------------------------------------


On 2011-01-07 02:21:38, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-07 02:21:38)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by Vishal Kher <vi...@gmail.com>.

> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, line 336
> > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line336>
> >
> >     If this error is fatal, then I was wondering if we shouldn't abort lookForLeader() and perhaps even propagate it up so that we kill the peer. What do you think?
> 
> Vishal Kher wrote:
>     Looking at the code this error case should never happen. But it would be good to propogate back fatal errors. How do you propose to progate the error and kill the peer?
> 
> fpj wrote:
>     One option is to throw an exception from lookForLeader, catch it in the main loop in QuorumPeer, shutdown the peer in the catch block, and exit the main thread. The main difficulty is propagating the error to lookForLeader. The only option I see is propagating it through a special message that FLE receives through the recvQueue.
> 
> fpj wrote:
>     I don't know if you have made any progress here, Vishal, but if you haven't perhaps we should consider having a separate jira for it. I've been thinking that this is a more general problem with error handling between QCM and FLE. How does it sound?

Sounds good to me. Can I get a "ship it" if you are ok with rest of the code?


- Vishal


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review95
-----------------------------------------------------------


On 2011-01-17 03:55:41, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-17 03:55:41)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by fp...@apache.org.

> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, line 336
> > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line336>
> >
> >     If this error is fatal, then I was wondering if we shouldn't abort lookForLeader() and perhaps even propagate it up so that we kill the peer. What do you think?
> 
> Vishal Kher wrote:
>     Looking at the code this error case should never happen. But it would be good to propogate back fatal errors. How do you propose to progate the error and kill the peer?
> 
> fpj wrote:
>     One option is to throw an exception from lookForLeader, catch it in the main loop in QuorumPeer, shutdown the peer in the catch block, and exit the main thread. The main difficulty is propagating the error to lookForLeader. The only option I see is propagating it through a special message that FLE receives through the recvQueue.

I don't know if you have made any progress here, Vishal, but if you haven't perhaps we should consider having a separate jira for it. I've been thinking that this is a more general problem with error handling between QCM and FLE. How does it sound? 


- fpj


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review95
-----------------------------------------------------------


On 2011-01-17 03:55:41, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-17 03:55:41)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by fp...@apache.org.

> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, line 896
> > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line896>
> >
> >     Shouldn't we call sw.shutdown() from finish?
> 
> Vishal Kher wrote:
>     shutdown sets the running flag to false and just interrupts the thread. The thread calls finish before exiting. I found it to be cleaner to call finish from one place (while exiting the thread).

Ok, sounds good.


> On 2011-01-09 06:48:15, fpj wrote:
> > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, line 336
> > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line336>
> >
> >     If this error is fatal, then I was wondering if we shouldn't abort lookForLeader() and perhaps even propagate it up so that we kill the peer. What do you think?
> 
> Vishal Kher wrote:
>     Looking at the code this error case should never happen. But it would be good to propogate back fatal errors. How do you propose to progate the error and kill the peer?

One option is to throw an exception from lookForLeader, catch it in the main loop in QuorumPeer, shutdown the peer in the catch block, and exit the main thread. The main difficulty is propagating the error to lookForLeader. The only option I see is propagating it through a special message that FLE receives through the recvQueue.  


- fpj


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review95
-----------------------------------------------------------


On 2011-01-17 03:55:41, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-17 03:55:41)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by fp...@apache.org.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review95
-----------------------------------------------------------


Great job, Vishal. I have mostly minor points about your patch. Thanks for working on it!


trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
<https://reviews.apache.org/r/240/#comment174>

    The name of this variable doesn't bother me.



trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
<https://reviews.apache.org/r/240/#comment173>

    Indentation seems to be incorrect for this block.



trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
<https://reviews.apache.org/r/240/#comment175>

    If this error is fatal, then I was wondering if we shouldn't abort lookForLeader() and perhaps even propagate it up so that we kill the peer. What do you think?



trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
<https://reviews.apache.org/r/240/#comment178>

    Indentation seems to be incorrect.



trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
<https://reviews.apache.org/r/240/#comment177>

    Shouldn't we call sw.shutdown() from finish?



trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java
<https://reviews.apache.org/r/240/#comment172>

    Isn't it better to break this one up into multiple test cases? (Same for the previous test case.)



trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java
<https://reviews.apache.org/r/240/#comment180>

    You may consider Assert.assertTrue(msg, cnt == ecnt).



trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java
<https://reviews.apache.org/r/240/#comment179>

    If this is commented out, we should probably remove these lines from the patch.


- fpj


On 2011-01-07 02:21:38, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-07 02:21:38)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by fp...@apache.org.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/#review167
-----------------------------------------------------------


I'd like to check that tests pass for me before committing. I'll mark it to ship once I do it, and I need a 3.3.3 patch for that.

- fpj


On 2011-01-17 03:55:41, Vishal Kher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/240/
> -----------------------------------------------------------
> 
> (Updated 2011-01-17 03:55:41)
> 
> 
> Review request for zookeeper and fpj.
> 
> 
> Summary
> -------
> 
> QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
> If the peer happens to be the leader, then it will block new peers from becoming a follower.
> 
> I have made changes as per ZOOKEEPER-932
> 
> 
> This addresses bug ZOOKEEPER-932.
>     https://issues.apache.org/jira/browse/ZOOKEEPER-932
> 
> 
> Diffs
> -----
> 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
>   trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
>   trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 
> 
> Diff: https://reviews.apache.org/r/240/diff
> 
> 
> Testing
> -------
> 
> - ant test-core-java
> - systest
> - basic hand testing
> - rebooted follower/leader several times
> 
> 
> Thanks,
> 
> Vishal
> 
>


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by Vishal Kher <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/
-----------------------------------------------------------

(Updated 2011-01-17 03:55:41.089412)


Review request for zookeeper and fpj.


Changes
-------

Resubmitting patch for review. This patch incorporates previous review comments + has following fixes.
CnxManagerTest.java
      In the second patch that I sent for review some of the tests did not have the @Test junit tag. As a result, they were not getting executed. 
FastLeaderElection.java
LearnerHandler.java
      Change in thread name
QuorumCnxManager.java
       Changed the place where the threadCounter was decremented. Other minor fixes.


Summary
-------

QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
If the peer happens to be the leader, then it will block new peers from becoming a follower.

I have made changes as per ZOOKEEPER-932


This addresses bug ZOOKEEPER-932.
    https://issues.apache.org/jira/browse/ZOOKEEPER-932


Diffs (updated)
-----

  trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java 1040328 
  trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java 1040328 
  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
  trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 

Diff: https://reviews.apache.org/r/240/diff


Testing
-------

- ant test-core-java
- systest
- basic hand testing
- rebooted follower/leader several times


Thanks,

Vishal


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by Vishal Kher <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/
-----------------------------------------------------------

(Updated 2011-01-10 04:30:27.786025)


Review request for zookeeper and fpj.


Changes
-------

Made changes per suggestions. 


Summary
-------

QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
If the peer happens to be the leader, then it will block new peers from becoming a follower.

I have made changes as per ZOOKEEPER-932


This addresses bug ZOOKEEPER-932.
    https://issues.apache.org/jira/browse/ZOOKEEPER-932


Diffs (updated)
-----

  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1057153 
  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1057153 
  trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1057153 

Diff: https://reviews.apache.org/r/240/diff


Testing
-------

- ant test-core-java
- systest
- basic hand testing
- rebooted follower/leader several times


Thanks,

Vishal


Re: Review Request: Move blocking read/write calls to SendWorker and RecvWorker Threads

Posted by Vishal Kher <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/240/
-----------------------------------------------------------

(Updated 2011-01-07 02:21:38.514056)


Review request for zookeeper and fpj.


Changes
-------

Including description. For some reason my description got dropped from the earlier request.


Summary (updated)
-------

QuorumCnxManager performed blocking socket IO at a few places. As a result, QCM on a peer could block forever which would prevent other peers from connecting to the blocked peer.
If the peer happens to be the leader, then it will block new peers from becoming a follower.

I have made changes as per ZOOKEEPER-932


This addresses bug ZOOKEEPER-932.
    https://issues.apache.org/jira/browse/ZOOKEEPER-932


Diffs
-----

  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 1040328 
  trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java 1040328 
  trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 

Diff: https://reviews.apache.org/r/240/diff


Testing
-------

- ant test-core-java
- systest
- basic hand testing
- rebooted follower/leader several times


Thanks,

Vishal