You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2020/01/28 13:30:54 UTC

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#discussion_r371788520
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1078,6 +1078,77 @@ property, when available, is noted below.
     effect due to TLS handshake timeout when there are too many in-flight TLS 
     handshakes. Set it to something like 250 is good enough to avoid herd effect.
 
+* *leader.snapPingIntervalInSeconds*
+    (Jave system property only: **zookeeper.leader.snapPingIntervalInSeconds**)
+    Set the interval of snapshot scheduler, this is also the switch for 
+    enabling/disabling snapshot scheduler. 
+    
+    Snapshot scheduler is the feature used to coordinate the time of snapshot 
+    happens in the quorum, which avoid high latency issue due to majority of 
+    servers taking snapshot at the same time when running on a single disk 
+    driver.
+
+    A new quorum packet is added: SNAPPING, but it's backwards compatible and can be 
+    rolled out safely with rolling restart. Leader will check and start the snapshot 
+    scheduler if it's enabled, and send SNAPPING to the quorum. If the follower is 
+    running old code, it will ignore that packet. When follower with new code received 
+    SNAPPING packet, it will turn off the periodically snapshot locally, and only 
+    taking safety snapshot if the if the txns since last snapshot is much larger than 
+    the threshold defined in SyncRequestProcessor. This is used to avoid issues like 
+    the follower accumulated too many txns before it is scheduled to take snapshot.
+    
+    The default value is -1, which disables the central snapshot scheduler in 
+    quorum. The suggest value would be 20s, which means it checks and schedule 
+    the next round of snapshot every 20s. Note that each round will only schedule 
+    at most one server to take snapshot.
+
+    Also there is a JMX setting on leader to turn it on and off in flight.
+
+* *leader.snapTxnsThreshold*
+    (Jave system property only: **zookeeper.leader.snapTxnsThreshold**)
 
 Review comment:
   Typo: 'Jave'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services