You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by GitBox <gi...@apache.org> on 2021/10/29 09:30:23 UTC

[GitHub] [curator] woaishixiaoxiao opened a new pull request #398: fix the bug of double master when use LeaderLatch to select the leader

woaishixiaoxiao opened a new pull request #398:
URL: https://github.com/apache/curator/pull/398


   When I use the LeaderLatch to select leader,  there is a double-leader phenomenon.
   The timeline is as follows:
   1. The zk cluster switch leader node bescause of zxid overflow
   2. A client(not leader befor zxid overflow) and B client(is leader before zxid overflow) enter the suspend state, B client set  its leader status to false
   3. B client enter the reconnect state, call the reset function.  Delete its old path.
   4. A client enter the reconnect state  and receive preNodeDeleteEvent.  Then getChildren from zkServer.  Find itself is the smallest number and set itself as a leader.
   5. B client create a new temporary node  and then getChildren from zkServer.  Find itself not the node with the smallest serial number and listen to the previous node delete event.
   6. A client delete its old path.
   7. B client receive the preNodeDeleteEvent. then getchildren from zkServer. Find itself is the smallest sequence number and then set itself as a leader
   8. A client create  a new temporary node  and then getChildren from zkServer.  Find itself not the node with the smallest serial number and listen to the previous node delete event. but it doesn't  set itself as a non-leader state. because of the fourth step operation, A still is leader state now.
   now  A client and B client are  the leader at the same time 
   
    
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [curator] eolivelli commented on pull request #398: fix the bug of double leader when use LeaderLatch to select the leader

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-954614884


   @woaishixiaoxiao  thanks for sharing your fix,
   do you think that we can add a test case to cover this change ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [curator] woaishixiaoxiao commented on pull request #398: fix the bug of double leader when use LeaderLatch to select the leader

Posted by GitBox <gi...@apache.org>.
woaishixiaoxiao commented on pull request #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-957782794


   > @woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?
   
   HI i have added a unit test. please approval thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [curator] woaishixiaoxiao edited a comment on pull request #398: fix the bug of double leader when use LeaderLatch to select the leader

Posted by GitBox <gi...@apache.org>.
woaishixiaoxiao edited a comment on pull request #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-954646817


   > @woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?
   
   OK. I will  try it.   
   and  I find another question related to the leader-selection scenari. When zkServer switch the leader and then returns to normal, all clients will execute state switching: connected->suspend->reconn
   Because leaderlatch processing the reconn state will reset leader status, that is mean first set itself leader status false and then  delete old temporary sequence Node and create a new one.   This operation will cause the business side to perform a leader switch multiple.  Some businesses don’t want to see such frequent switchovers happen such as mq.  Also this operation will cause  nodeDeleteEvent push once from zk server but client execute multiple times  nodeDeleteCallback on same path because client saves mutiple watch local(create new path will getchild and listen.  and prenodedeleteEvent also will getchild and listen ). 
   Why don't we replace SessionConnectionStateErrorPolicy with StandardConnectionStateErrorPolicy? The above phenomenon will be avoided


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [curator] woaishixiaoxiao commented on pull request #398: fix the bug of double leader when use LeaderLatch to select the leader

Posted by GitBox <gi...@apache.org>.
woaishixiaoxiao commented on pull request #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-957782794


   > @woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?
   
   HI i have added a unit test. please approval thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [curator] woaishixiaoxiao commented on pull request #398: fix the bug of double leader when use LeaderLatch to select the leader

Posted by GitBox <gi...@apache.org>.
woaishixiaoxiao commented on pull request #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-957782794


   > @woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?
   
   HI i have added a unit test. please approval thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [curator] woaishixiaoxiao commented on pull request #398: fix the bug of double leader when use LeaderLatch to select the leader

Posted by GitBox <gi...@apache.org>.
woaishixiaoxiao commented on pull request #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-954646817


   > @woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?
   
   OK. I will  try it.   
   and  I find another question related to the leader-selection scenari. When zkServer switch the leader and then returns to normal, all clients will execute state switching: connected->suspend->reconn
   Because leaderlatch processing the reconn state will reset leader status, that is mean first set itself leader status false and then  delete old temporary sequence Node and create a new one.   This operation will cause the business side to perform a leader switch multiple.  Some businesses don’t want to see such frequent switchovers happen such as mq.  Also this operation will cause  nodeDeleteEvent push once from zk server but client execute multiple times  nodeDeleteCallback on same path because client saves mutiple watch local(create new path will getchild and listen.  and prenodedelete also will getchild and listen ). 
   Why don't we replace SessionConnectionStateErrorPolicy with StandardConnectionStateErrorPolicy? The above phenomenon will be avoided


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [curator] woaishixiaoxiao edited a comment on pull request #398: fix the bug of double leader when use LeaderLatch to select the leader

Posted by GitBox <gi...@apache.org>.
woaishixiaoxiao edited a comment on pull request #398:
URL: https://github.com/apache/curator/pull/398#issuecomment-954646817


   > @woaishixiaoxiao thanks for sharing your fix, do you think that we can add a test case to cover this change ?
   
   OK. I will  try it.   
   and  I find another question related to the leader-selection scenari. When zkServer switch the leader and then returns to normal, all clients will execute state switching: connected->suspend->reconn
   Because leaderlatch processing the reconn state will reset leader status, that is mean first set itself leader status false and then  delete old temporary sequence Node and create a new one.   This operation will cause the business side to perform a leader switch multiple.  Some businesses don’t want to see such frequent switchovers happen such as mq.  Also this operation will cause  nodeDeleteEvent push once from zk server but client execute multiple times  nodeDeleteCallback on same path because client saves mutiple watch local(create new path will getchild and listen.  and prenodedeleteEvent also will getchild and listen ). 
   Why don't we replace StandardConnectionStateErrorPolicy with SessionConnectionStateErrorPolicy? The above phenomenon will be avoided


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@curator.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org