You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@curator.apache.org by Jordan Zimmerman <jo...@jordanzimmerman.com> on 2020/02/26 14:53:44 UTC

Re: Curator listener issue

A few things:

What is RetryLoopExecutor? Curator does all retries internally. You should not have your own retry mechanism.
Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this internally already.
What do you do with the "isConnected" value? That seems suspicious to me.
You do not get LOST until the session expires. How long is "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.

Other than that, I'd need to see the code for handleConnectionStateChange(), handleTakeSingletonRole() and RetryLoopExecutor. Also, the logs from the instance that doesn't get LOST would be helpful.

-Jordan

> On Feb 26, 2020, at 9:46 AM, Arpit Jain <ja...@gmail.com> wrote:
> 
> Here is the setup I have :
> 
> 2 application instances using Curator (4.2.0) connected to Zookeeper (3.5.6) running docker for leader election/locks/service discovery. The way I create curator instance is below:
>         final CuratorFramework curatorFramework =
>             CuratorFrameworkFactory.newClient(coordinatorHosts, coordinatorSessionTimeout, coordinatorConnectionTimeout, retryPolicy);
> 
>         curatorFramework.getConnectionStateListenable()
>             .addListener((client, newState) -> LOGGER.warn("ZOOKEEPER STATE CHANGED TO : {}", newState));
> 
>          isConnected = RetryLoopExecutor.execute(curatorFramework, () -> {
>                    curatorFramework.start();
>                 curatorFramework.getZookeeperClient().blockUntilConnectedOrTimedOut();
>                 return curatorFramework.getZookeeperClient().isConnected();
> 
>          });
> 
> The Leader selection is done using below method:
> 
>     public void createSingletonServerSelector() throws Exception {
> 
>         if (_logger.isInfoEnabled()) {
>             _logger.info("Initializing SingletonServerSelector on Coordinator ZNode [" + coordinatorZnode + "].");
>         }
>         _singletonServerSelector = new LeaderSelector(curatorFramework, coordinatorZnode, new LeaderSelectorListener() {
> 
>             @Override
>             public final void stateChanged(final CuratorFramework client, final ConnectionState newState) {
>                 handleConnectionStateChange(newState);
>             }
> 
>             @Override
>             public final void takeLeadership(final CuratorFramework client) throws Exception {
>                 handleTakeSingletonRole();
>             }
>         });
>         _singletonServerSelector.setId(coordinatorID);
> 
>         _logger.info("SingletonServerSelector initialized.");
> 
>         _singletonServerSelector.autoRequeue();
> 
>         RetryLoopExecutor.execute(curatorFramework, () -> {
>             _singletonServerSelector.start();
>             return null; 
>         });
> 
>     }
> 
> Once I stop the Zookeeper container, I am getting LOST message in "stateChanged" method above and I am shutting down my application in "handleConnectionStateChange" method. But only weird thing is I don't get "ZOOKEEPER STATE CHANGED TO : LOST" every time which I am printing in the listener above.
> 
> I know this might still not be enough to diagnose the issue but I am not sure what else code I can post. Its not critical for our application because we are just printing log message in listener and LOST message is received.
> 
> Thanks
> 
> 
> On Wed, Feb 26, 2020 at 2:26 PM Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
> Hi,
> 
> You sent two lines of code. I don't know how I'm supposed to help you with just that. I have no information. 
> 
> -Jordan
> 
> > On Feb 26, 2020, at 9:25 AM, Arpit Jain <jain.arpit6@gmail.com <ma...@gmail.com>> wrote:
> > 
> > HI Jordan,
> > 
> > I am not sure what or how to provide the test case. Could you elaborate on how I can provide you the test case/scenario ?
> > 
> > Thanks
>

Re: Curator listener issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

That sleep won't solve the problem as there is a single thread for Connection State Listeners. Instead, you can pass different Executors when you register your listener and your sleep would then have the desired effect.

-Jordan

> On Feb 26, 2020, at 10:37 AM, Arpit Jain <ja...@gmail.com> wrote:
> 
> Thats what exactly I also thought so I put a sleep with 10 seconds before calling shutting down and closing the curator instance but it did not change the behaviour.
> 
> On Wed, Feb 26, 2020 at 3:30 PM Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
> You are shutting down your server in the handleConnectionStateChange() handler. So, maybe your application shutsdown/closes the Curator handle before the other handler gets called. i.e. a kind of race.
> 
> -Jordan
> 
>> On Feb 26, 2020, at 10:15 AM, Arpit Jain <jain.arpit6@gmail.com <ma...@gmail.com>> wrote:
>> 
>> What is RetryLoopExecutor? Curator does all retries internally. You should not have your own retry mechanism.
>> Its just a retry loop for the submitted action. Will remove it if its already done by curator.
>> Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this internally already.
>> Will remove it
>> What do you do with the "isConnected" value? That seems suspicious to me.
>> Its not used anywhere. Just for logging if we get connected or not
>> You do not get LOST until the session expires. How long is "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.
>>       "ConnectionTimeout": 10000,
>>       "Hosts": "localhost:2181",
>>       "MaxRetries": 3,
>>       "RetryTimeout": 3000,
>>       "SessionTimeout": 90000,
>> As I said earlier, I am receiving LOST on both application instances. Its only that specific listener is not getting called. Here are the logs 
>> 
>> [L: WARN] [O: c.t.s.c.ZookeeperHelper] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] ZOOKEEPER STATE CHANGED TO : SUSPENDED
>> 2020-02-26 12:39:13.107+0000 [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 40003. Adjusted session timeout ms: 40000
>> 2020-02-26 12:39:13.109+0000 [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 40002. Adjusted session timeout ms: 40000
>> 2020-02-26 12:39:13.109+0000 [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 40002. Adjusted session timeout ms: 40000
>> 2020-02-26 12:39:13.111+0000 [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session expired event received
>> 2020-02-26 12:39:13.111+0000 [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session expired event received
>> 2020-02-26 12:39:13.111+0000 [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] [S: ] [P: platform2] [T: ForkJoinPool.commonPool-worker-1-EventThread] Session expired event received
>> 2020-02-26 12:39:24.405+0000 [L: ERROR] [O: c.t.s.c.ClusteredModeCoordinator] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Could not connect to ZK. Shutting down server
>> 
>> As you can see in above logs, the message "ZOOKEEPER STATE CHANGED TO : LOST" is missing 
>> 
>>     private final void handleConnectionStateChange(final ConnectionState newConnectionState) {
>> 
>>         if (ConnectionState.LOST.name().equalsIgnoreCase(newConnectionState.name())) {
>>             _logger.error("Could not connect to ZK. Shutting down server..");
>>             shutDownPlatform(1);
>>         } else if (_isSingletonServer && ConnectionState.SUSPENDED.name().equalsIgnoreCase(newConnectionState.name())) {
>>             notifyClusterModeCoordinatorListenersOfSingletonRoleLoss();
>>         }
>>     }
>> 
>>     private final void handleTakeSingletonRole() throws Exception {
>> 
>>         _logger.info("handleTakeSingletonRole: Received Singleton Role.");
>> 
>>         // We have just become the Singleton Server. Ha!
>>         try {
>>             while (true) {
>>                 try {
>>                     Thread.sleep(1000);
>>                 } catch (Exception e) {
>>                     // ignore interrupted exception
>>                 }
>>             }
>>     }
>> 
>> Thanks
>> 
>> On Wed, Feb 26, 2020 at 2:53 PM Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> A few things:
>> 
>> What is RetryLoopExecutor? Curator does all retries internally. You should not have your own retry mechanism.
>> Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this internally already.
>> What do you do with the "isConnected" value? That seems suspicious to me.
>> You do not get LOST until the session expires. How long is "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.
>> 
>> Other than that, I'd need to see the code for handleConnectionStateChange(), handleTakeSingletonRole() and RetryLoopExecutor. Also, the logs from the instance that doesn't get LOST would be helpful.
>> 
>> -Jordan
>> 
>>> On Feb 26, 2020, at 9:46 AM, Arpit Jain <jain.arpit6@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Here is the setup I have :
>>> 
>>> 2 application instances using Curator (4.2.0) connected to Zookeeper (3.5.6) running docker for leader election/locks/service discovery. The way I create curator instance is below:
>>>         final CuratorFramework curatorFramework =
>>>             CuratorFrameworkFactory.newClient(coordinatorHosts, coordinatorSessionTimeout, coordinatorConnectionTimeout, retryPolicy);
>>> 
>>>         curatorFramework.getConnectionStateListenable()
>>>             .addListener((client, newState) -> LOGGER.warn("ZOOKEEPER STATE CHANGED TO : {}", newState));
>>> 
>>>          isConnected = RetryLoopExecutor.execute(curatorFramework, () -> {
>>>                    curatorFramework.start();
>>>                 curatorFramework.getZookeeperClient().blockUntilConnectedOrTimedOut();
>>>                 return curatorFramework.getZookeeperClient().isConnected();
>>> 
>>>          });
>>> 
>>> The Leader selection is done using below method:
>>> 
>>>     public void createSingletonServerSelector() throws Exception {
>>> 
>>>         if (_logger.isInfoEnabled()) {
>>>             _logger.info("Initializing SingletonServerSelector on Coordinator ZNode [" + coordinatorZnode + "].");
>>>         }
>>>         _singletonServerSelector = new LeaderSelector(curatorFramework, coordinatorZnode, new LeaderSelectorListener() {
>>> 
>>>             @Override
>>>             public final void stateChanged(final CuratorFramework client, final ConnectionState newState) {
>>>                 handleConnectionStateChange(newState);
>>>             }
>>> 
>>>             @Override
>>>             public final void takeLeadership(final CuratorFramework client) throws Exception {
>>>                 handleTakeSingletonRole();
>>>             }
>>>         });
>>>         _singletonServerSelector.setId(coordinatorID);
>>> 
>>>         _logger.info("SingletonServerSelector initialized.");
>>> 
>>>         _singletonServerSelector.autoRequeue();
>>> 
>>>         RetryLoopExecutor.execute(curatorFramework, () -> {
>>>             _singletonServerSelector.start();
>>>             return null; 
>>>         });
>>> 
>>>     }
>>> 
>>> Once I stop the Zookeeper container, I am getting LOST message in "stateChanged" method above and I am shutting down my application in "handleConnectionStateChange" method. But only weird thing is I don't get "ZOOKEEPER STATE CHANGED TO : LOST" every time which I am printing in the listener above.
>>> 
>>> I know this might still not be enough to diagnose the issue but I am not sure what else code I can post. Its not critical for our application because we are just printing log message in listener and LOST message is received.
>>> 
>>> Thanks
>>> 
>>> 
>>> On Wed, Feb 26, 2020 at 2:26 PM Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>>> Hi,
>>> 
>>> You sent two lines of code. I don't know how I'm supposed to help you with just that. I have no information. 
>>> 
>>> -Jordan
>>> 
>>> > On Feb 26, 2020, at 9:25 AM, Arpit Jain <jain.arpit6@gmail.com <ma...@gmail.com>> wrote:
>>> > 
>>> > HI Jordan,
>>> > 
>>> > I am not sure what or how to provide the test case. Could you elaborate on how I can provide you the test case/scenario ?
>>> > 
>>> > Thanks
>>> 
>> 
>

Re: Curator listener issue

Posted by Jordan Zimmerman <jo...@jordanzimmerman.com>.

You are shutting down your server in the handleConnectionStateChange() handler. So, maybe your application shutsdown/closes the Curator handle before the other handler gets called. i.e. a kind of race.

-Jordan

> On Feb 26, 2020, at 10:15 AM, Arpit Jain <ja...@gmail.com> wrote:
> 
> What is RetryLoopExecutor? Curator does all retries internally. You should not have your own retry mechanism.
> Its just a retry loop for the submitted action. Will remove it if its already done by curator.
> Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this internally already.
> Will remove it
> What do you do with the "isConnected" value? That seems suspicious to me.
> Its not used anywhere. Just for logging if we get connected or not
> You do not get LOST until the session expires. How long is "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.
>       "ConnectionTimeout": 10000,
>       "Hosts": "localhost:2181",
>       "MaxRetries": 3,
>       "RetryTimeout": 3000,
>       "SessionTimeout": 90000,
> As I said earlier, I am receiving LOST on both application instances. Its only that specific listener is not getting called. Here are the logs 
> 
> [L: WARN] [O: c.t.s.c.ZookeeperHelper] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] ZOOKEEPER STATE CHANGED TO : SUSPENDED
> 2020-02-26 12:39:13.107+0000 [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 40003. Adjusted session timeout ms: 40000
> 2020-02-26 12:39:13.109+0000 [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 40002. Adjusted session timeout ms: 40000
> 2020-02-26 12:39:13.109+0000 [L: WARN] [O: o.a.c.f.s.ConnectionStateManager] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 40002. Adjusted session timeout ms: 40000
> 2020-02-26 12:39:13.111+0000 [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session expired event received
> 2020-02-26 12:39:13.111+0000 [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] [S: ] [P: platform2] [T: localhost-startStop-1-EventThread] Session expired event received
> 2020-02-26 12:39:13.111+0000 [L: WARN] [O: o.a.c.ConnectionState] [I: ] [U: ] [S: ] [P: platform2] [T: ForkJoinPool.commonPool-worker-1-EventThread] Session expired event received
> 2020-02-26 12:39:24.405+0000 [L: ERROR] [O: c.t.s.c.ClusteredModeCoordinator] [I: ] [U: ] [S: ] [P: platform2] [T: Curator-ConnectionStateManager-0] Could not connect to ZK. Shutting down server
> 
> As you can see in above logs, the message "ZOOKEEPER STATE CHANGED TO : LOST" is missing 
> 
>     private final void handleConnectionStateChange(final ConnectionState newConnectionState) {
> 
>         if (ConnectionState.LOST.name().equalsIgnoreCase(newConnectionState.name())) {
>             _logger.error("Could not connect to ZK. Shutting down server..");
>             shutDownPlatform(1);
>         } else if (_isSingletonServer && ConnectionState.SUSPENDED.name().equalsIgnoreCase(newConnectionState.name())) {
>             notifyClusterModeCoordinatorListenersOfSingletonRoleLoss();
>         }
>     }
> 
>     private final void handleTakeSingletonRole() throws Exception {
> 
>         _logger.info("handleTakeSingletonRole: Received Singleton Role.");
> 
>         // We have just become the Singleton Server. Ha!
>         try {
>             while (true) {
>                 try {
>                     Thread.sleep(1000);
>                 } catch (Exception e) {
>                     // ignore interrupted exception
>                 }
>             }
>     }
> 
> Thanks
> 
> On Wed, Feb 26, 2020 at 2:53 PM Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
> A few things:
> 
> What is RetryLoopExecutor? Curator does all retries internally. You should not have your own retry mechanism.
> Calling blockUntilConnectedOrTimedOut() is unnecessary. Curator does this internally already.
> What do you do with the "isConnected" value? That seems suspicious to me.
> You do not get LOST until the session expires. How long is "coordinatorSessionTimeout"? You won't receive LOST until that has elapsed.
> 
> Other than that, I'd need to see the code for handleConnectionStateChange(), handleTakeSingletonRole() and RetryLoopExecutor. Also, the logs from the instance that doesn't get LOST would be helpful.
> 
> -Jordan
> 
>> On Feb 26, 2020, at 9:46 AM, Arpit Jain <jain.arpit6@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Here is the setup I have :
>> 
>> 2 application instances using Curator (4.2.0) connected to Zookeeper (3.5.6) running docker for leader election/locks/service discovery. The way I create curator instance is below:
>>         final CuratorFramework curatorFramework =
>>             CuratorFrameworkFactory.newClient(coordinatorHosts, coordinatorSessionTimeout, coordinatorConnectionTimeout, retryPolicy);
>> 
>>         curatorFramework.getConnectionStateListenable()
>>             .addListener((client, newState) -> LOGGER.warn("ZOOKEEPER STATE CHANGED TO : {}", newState));
>> 
>>          isConnected = RetryLoopExecutor.execute(curatorFramework, () -> {
>>                    curatorFramework.start();
>>                 curatorFramework.getZookeeperClient().blockUntilConnectedOrTimedOut();
>>                 return curatorFramework.getZookeeperClient().isConnected();
>> 
>>          });
>> 
>> The Leader selection is done using below method:
>> 
>>     public void createSingletonServerSelector() throws Exception {
>> 
>>         if (_logger.isInfoEnabled()) {
>>             _logger.info("Initializing SingletonServerSelector on Coordinator ZNode [" + coordinatorZnode + "].");
>>         }
>>         _singletonServerSelector = new LeaderSelector(curatorFramework, coordinatorZnode, new LeaderSelectorListener() {
>> 
>>             @Override
>>             public final void stateChanged(final CuratorFramework client, final ConnectionState newState) {
>>                 handleConnectionStateChange(newState);
>>             }
>> 
>>             @Override
>>             public final void takeLeadership(final CuratorFramework client) throws Exception {
>>                 handleTakeSingletonRole();
>>             }
>>         });
>>         _singletonServerSelector.setId(coordinatorID);
>> 
>>         _logger.info("SingletonServerSelector initialized.");
>> 
>>         _singletonServerSelector.autoRequeue();
>> 
>>         RetryLoopExecutor.execute(curatorFramework, () -> {
>>             _singletonServerSelector.start();
>>             return null; 
>>         });
>> 
>>     }
>> 
>> Once I stop the Zookeeper container, I am getting LOST message in "stateChanged" method above and I am shutting down my application in "handleConnectionStateChange" method. But only weird thing is I don't get "ZOOKEEPER STATE CHANGED TO : LOST" every time which I am printing in the listener above.
>> 
>> I know this might still not be enough to diagnose the issue but I am not sure what else code I can post. Its not critical for our application because we are just printing log message in listener and LOST message is received.
>> 
>> Thanks
>> 
>> 
>> On Wed, Feb 26, 2020 at 2:26 PM Jordan Zimmerman <jordan@jordanzimmerman.com <ma...@jordanzimmerman.com>> wrote:
>> Hi,
>> 
>> You sent two lines of code. I don't know how I'm supposed to help you with just that. I have no information. 
>> 
>> -Jordan
>> 
>> > On Feb 26, 2020, at 9:25 AM, Arpit Jain <jain.arpit6@gmail.com <ma...@gmail.com>> wrote:
>> > 
>> > HI Jordan,
>> > 
>> > I am not sure what or how to provide the test case. Could you elaborate on how I can provide you the test case/scenario ?
>> > 
>> > Thanks
>> 
>