You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Rhys Yarranton (Jira)" <ji...@apache.org> on 2020/06/03 06:06:00 UTC

[jira] [Comment Edited] (CURATOR-570) Excessive calls to ZooKeeper.updateServerList (which can result in session death)

    [ https://issues.apache.org/jira/browse/CURATOR-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124622#comment-17124622 ] 

Rhys Yarranton edited comment on CURATOR-570 at 6/3/20, 6:05 AM:
-----------------------------------------------------------------

Cam raises a good question.  I just put a debugger on a local test program and found helperConnectionString and ensembleProviderConnectionString had superficially different values, one using host names and the other using IP addresses.  Looking further, disagreement can also happen due servers being in a different order.  (Which in our case appears to be possible, even likely.)

I also notice that the 4.2 and 4.3 code for getNewConnectionString are different.  Here is the 4.2 version:
{code:java}
String getNewConnectionString()
{
    String helperConnectionString = (helper != null) ? helper.getConnectionString() : null;
    return ((helperConnectionString != null) && !ensembleProvider.getConnectionString().equals(helperConnectionString)) ? helperConnectionString : null;
}
{code}
Our worst problems were under 4.2, so it's possible the difference is significant.

Update: The 4.2 vs 4.3 diff is part of CURATOR-551.


was (Author: ryarran):
Cam raises a good question.  I just put a debugger on a local test program and found helperConnectionString and ensembleProviderConnectionString had superficially different values, one using host names and the other using IP addresses.  Looking further, disagreement can also happen due servers being in a different order.  (Which in our case appears to be possible, even likely.)

I also notice that the 4.2 and 4.3 code for getNewConnectionString are different.  Here is the 4.2 version:
{code:java}
String getNewConnectionString()
{
    String helperConnectionString = (helper != null) ? helper.getConnectionString() : null;
    return ((helperConnectionString != null) && !ensembleProvider.getConnectionString().equals(helperConnectionString)) ? helperConnectionString : null;
}
{code}
Our worst problems were under 4.2, so it's possible the difference is significant.

> Excessive calls to ZooKeeper.updateServerList (which can result in session death)
> ---------------------------------------------------------------------------------
>
>                 Key: CURATOR-570
>                 URL: https://issues.apache.org/jira/browse/CURATOR-570
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Framework
>    Affects Versions: 4.2.0, 4.3.0
>            Reporter: Rhys Yarranton
>            Priority: Major
>
> On suspend and reconnect, Curator calls ZooKeeper.updateServerList via ConnectionState.checkState --> ConnectionState.handleNewConnectionString.  In addition, recipes may be triggered by this as well, and they too make calls ZooKeeper.updateServerList via ConnectState.checkTimeouts --> ConnectionState.handleNewConnectionString.
> This happens even though the connection string has not actually changed.
> Due to ZOOKEEPER-3825, this can cause the connection to be closed immediately.  On its own this would be perceived as a glitch.  But due to the Curator-induced calls, what we see is a cycle of SUSPENDED/RECONNECTED, until eventually the session dies and a new session is recreated.
> Based on the source code (at time of writing), ZooKeeper.updateServerList is not intended to be called frequently like this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)