You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@zookeeper.apache.org by "Hoang Dang (Jira)" <ji...@apache.org> on 2020/04/01 04:25:00 UTC

[jira] [Created] (ZOOKEEPER-3778) Cannot upgrade from 3.5.7 to 3.6.0 due to multiAddress.reachabilityCheckEnabled

Hoang Dang created ZOOKEEPER-3778:
-------------------------------------

             Summary: Cannot upgrade from 3.5.7 to 3.6.0 due to multiAddress.reachabilityCheckEnabled
                 Key: ZOOKEEPER-3778
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3778
             Project: ZooKeeper
          Issue Type: Bug
    Affects Versions: 3.6.0
            Reporter: Hoang Dang


I upgrade our cluster from 3.5.7 to 3.6.0. I make small change in config for metricsProvider (prometheus) which I guess won't affect the our cluster's functions. But we get following error log: 
{code:java}
2020-04-01 04:04:57,892 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@292] - shutdown Follower
2020-04-01 04:04:57,892 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@863] - Peer state changed: looking
2020-04-01 04:04:57,892 [myid:1] - WARN  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1501] - PeerState set to LOOKING
2020-04-01 04:04:57,892 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1371] - LOOKING
2020-04-01 04:04:57,892 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):FastLeaderElection@931] - New election. My id = 1, proposed zxid=0x140000044b
2020-04-01 04:04:57,894 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:1, n.state:LOOKING, n.leader:1, n.round:$
2020-04-01 04:04:57,895 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:2, n.state:FOLLOWING, n.leader:3, n.roun$
2020-04-01 04:04:57,896 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@376] - Notification: my state:LOOKING; n.sid:3, n.state:LEADING, n.leader:3, n.round:$
2020-04-01 04:04:57,896 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@857] - Peer state changed: following
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@1453] - FOLLOWING
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1246] - minSessionTimeout set to 4000
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1255] - maxSessionTimeout set to 40000
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ResponseCache@45] - Response cache size is initialized with value 400.
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ResponseCache@45] - Response cache size is initialized with value 400.
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@111] - zookeeper.pathStats.slotCapacity = 60
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@112] - zookeeper.pathStats.slotDuration = 15
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@113] - zookeeper.pathStats.maxDepth = 6
2020-04-01 04:04:57,897 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@114] - zookeeper.pathStats.initialDelay = 5
2020-04-01 04:04:57,898 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@115] - zookeeper.pathStats.delay = 5
2020-04-01 04:04:57,898 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):RequestPathMetricsCollector@116] - zookeeper.pathStats.enabled = false
2020-04-01 04:04:57,898 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1470] - The max bytes for all large requests are set to 104857600
2020-04-01 04:04:57,898 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@1484] - The large request threshold is set to -1
2020-04-01 04:04:57,898 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):ZooKeeperServer@329] - Created server with tickTime 2000 minSessionTimeout 4000 maxSes$
2020-04-01 04:04:57,898 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@75] - FOLLOWING - LEADER ELECTION TOOK - 5 MS
2020-04-01 04:04:57,899 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumPeer@863] - Peer state changed: following - discovery
2020-04-01 04:04:57,900 [myid:1] - WARN  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):Follower@129] - Exception when following the leader
java.lang.IllegalArgumentException
        at java.base/java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1295)
        at java.base/java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1181)
        at java.base/java.util.concurrent.Executors.newFixedThreadPool(Executors.java:92)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:275)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:87)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1455)
{code}
 

 After checking the code [here|https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Learner.java]
{code:java}
        if (self.isMultiAddressReachabilityCheckEnabled()) {
            // even if none of the addresses are reachable, we want to try to establish connection
            // see ZOOKEEPER-3758
            addresses = multiAddr.getAllReachableAddressesOrAll();
        } else {
            addresses = multiAddr.getAllAddresses();
        }

        ExecutorService executor = Executors.newFixedThreadPool(addresses.size());  
{code}
I guess there's something wrong with *multiAddress.reachabilityCheckEnabled*. So I decide to turn it *off (false)*. After that, I can start our cluster as expected.

So could you please:
 * Update the document [here |http://zookeeper.apache.org/doc/r3.6.0/zookeeperAdmin.html] for _multiAddress.reachabilityCheckEnabled_ because it has effect even if _multiAddress.enabled=false_ (which is default)
 * Check the code in Learner.java to make sure _addresses.size()_ is always larger than 0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)