You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by GitBox <gi...@apache.org> on 2020/04/01 01:25:01 UTC

[GitHub] [curator] Randgalt opened a new pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Randgalt opened a new pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357
 
 
   There is a race whereby the ZooKeeper connection can be healed before Curator is finished processing the new connection state. When this happens
   the Curator instance becomes a Zombie stuck in the LOST state. This fix is a "hack". ConnectionStateManager will notice that the connection state is
   LOST but that the Curator instance reports that it is connected. When this happens, it is logged and the connection is reset.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] Randgalt merged pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
Randgalt merged pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] Randgalt commented on issue #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
Randgalt commented on issue #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#issuecomment-606972674
 
 
   attn: @jschlather and @chuchao333

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401813803
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   Much of that backend code needs to be reconsidered. If you're interested it would be a great project. I could help. If, I find the time, I'll rework some of it. For example, the ConnectionState code belongs in the client code, not the framework code. Pulling all that out and re-writing it directly in the client module would simplify things and give us an opportunity to remove some races due to their being two different threads (ZK's and Curator's).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401325301
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   Check now @chuchao333

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401328380
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   LGTM now, thanks for the fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401329641
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   TBH - this makes me nervous. I worry about an infinite loop of resets. I'll think about this some more before I merge.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401308665
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   right now, the `checkSessionExperiation` is only called `if sessionExpirationPercent > 0`, we should also change that?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
Randgalt commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401319664
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   Maybe - I feel like this is such a hack I wanted to limit it. But, maybe you're right.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401654234
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   @Randgalt yep, I agree. I would also prefer if we could fix this without firing the unexpected LOST in the first place.
   
   Also TBH, some of the curator code sounds "no obvious defect" instead of "obviously no defect" to me :) I do understand it's complex enough given all the edge cases it has to handle.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [curator] chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator

Posted by GitBox <gi...@apache.org>.
chuchao333 commented on a change in pull request #357: [CURATOR-525] Fix for Zombie LOST Curator
URL: https://github.com/apache/curator/pull/357#discussion_r401308665
 
 

 ##########
 File path: curator-framework/src/main/java/org/apache/curator/framework/state/ConnectionStateManager.java
 ##########
 @@ -319,8 +319,18 @@ else if ( currentConnectionState == ConnectionState.LOST )
         {
             try
             {
-                // give ConnectionState.checkTimeouts() a chance to run, reset ensemble providers, etc.
-                client.getZookeeperClient().getZooKeeper();
+                if ( client.getZookeeperClient().isConnected() )
 
 Review comment:
   right now, the `checkSessionExpiration` is only called `if sessionExpirationPercent > 0`, we should also change that?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services