You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2020/06/22 03:23:22 UTC

[GitHub] [zookeeper] yfxhust commented on a change in pull request #1247: ZOOKEEPER-3713: ReadOnlyZooKeeperServer should not expose the uninitiā€¦

yfxhust commented on a change in pull request #1247:
URL: https://github.com/apache/zookeeper/pull/1247#discussion_r443299864



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java
##########
@@ -1350,6 +1350,15 @@ public void run() {
                     ServerMetrics.getMetrics().LOOKING_COUNT.add(1);
 
                     if (Boolean.getBoolean("readonlymode.enabled")) {
+                        if (!zkDb.isInitialized()) {

Review comment:
       @lvfangmin Thank you for reply. Yes. I think you are right. But my patch tries to address the following scenarios. Maybe you can help to figure the mistake.
   
   1. At time T0, Observer O1 loses all the network connection. The network broken lasts for a long time(Maybe we can use iptables drop to simulate this network broken). So observer O1 fall into LOOKING state and serves as readonly server. Everything is OK here.
   
   2. At time T1, Observer O1 network broken is recovered and it is connected to Leader again. But because the quorum cluster has too much new txn and observer falls behind too much, O1 choose to use deserializeSnapshot() to sync ZKDatabase with leader. deserializeSnapshot() firstly uses clear() to clear ZKDatabase and then uses SerializeUtils.deserializeSnapshot() to recover dataTree. The time window between clear() and SerializeUtils.deserializeSnapshot() is dangerous because the dataTree may be null during this window. I think this dangerous window is not protected internally and can be catched by Zookeeper client.
   
   3. At time T2, Observer O1 finished the sync with leader. It quits from the blocking setCurrentVote(makeLEStrategy().lookForLeader()) and shutdown readonly server.
   
   So please let me know if I missed something in above inference.
   If my inference is true, my patch try to fix the dangerous window in step2(at time T1).
   
   Thank you!
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org