You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2019/12/20 07:13:02 UTC

[GitHub] [zookeeper] lvfangmin opened a new pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

lvfangmin opened a new pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] lvfangmin commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
lvfangmin commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#issuecomment-568073273
 
 
   retest maven build

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#discussion_r371788520
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1078,6 +1078,77 @@ property, when available, is noted below.
     effect due to TLS handshake timeout when there are too many in-flight TLS 
     handshakes. Set it to something like 250 is good enough to avoid herd effect.
 
+* *leader.snapPingIntervalInSeconds*
+    (Jave system property only: **zookeeper.leader.snapPingIntervalInSeconds**)
+    Set the interval of snapshot scheduler, this is also the switch for 
+    enabling/disabling snapshot scheduler. 
+    
+    Snapshot scheduler is the feature used to coordinate the time of snapshot 
+    happens in the quorum, which avoid high latency issue due to majority of 
+    servers taking snapshot at the same time when running on a single disk 
+    driver.
+
+    A new quorum packet is added: SNAPPING, but it's backwards compatible and can be 
+    rolled out safely with rolling restart. Leader will check and start the snapshot 
+    scheduler if it's enabled, and send SNAPPING to the quorum. If the follower is 
+    running old code, it will ignore that packet. When follower with new code received 
+    SNAPPING packet, it will turn off the periodically snapshot locally, and only 
+    taking safety snapshot if the if the txns since last snapshot is much larger than 
+    the threshold defined in SyncRequestProcessor. This is used to avoid issues like 
+    the follower accumulated too many txns before it is scheduled to take snapshot.
+    
+    The default value is -1, which disables the central snapshot scheduler in 
+    quorum. The suggest value would be 20s, which means it checks and schedule 
+    the next round of snapshot every 20s. Note that each round will only schedule 
+    at most one server to take snapshot.
+
+    Also there is a JMX setting on leader to turn it on and off in flight.
+
+* *leader.snapTxnsThreshold*
+    (Jave system property only: **zookeeper.leader.snapTxnsThreshold**)
 
 Review comment:
   Typo: 'Jave'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
anmolnar commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#issuecomment-576778442
 
 
   @lvfangmin I second @hanm 's questions. This patch is quite huge, are you still working on it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#discussion_r371794489
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Follower.java
 ##########
 @@ -148,13 +155,95 @@ void followLeader() throws InterruptedException {
         }
     }
 
+    // Visible for testing
+    protected void snapPing(QuorumPacket qp) throws IOException {
+        int peerSnapPingVersion = -1;
+        long snapPingId = -1;
+        int snapCode = -1;
+        try {
+            ByteArrayInputStream bis = new ByteArrayInputStream(qp.getData());
+            DataInputStream dis = new DataInputStream(bis);
+            peerSnapPingVersion = dis.readInt();
+            snapPingId = dis.readLong();
+            snapCode = dis.readInt();
+        } catch (IOException e) {
+            LOG.error("Error while parsing SnapPing packet", e);
+            return;
+        }
+
+        if (peerSnapPingVersion != SnapPingManager.SNAP_PING_VERSION) {
+            LOG.warn("The SnapPing version on leader is {} which is not "
+                    + "compatible with ours {}, will skip", peerSnapPingVersion,
+                    SnapPingManager.SNAP_PING_VERSION);
+            if (fzk.syncProcessor.isOnlySnapWhenSafetyIsThreatened()) {
+                LOG.info("SnapPing version imcompatible, start self snapshot");
 
 Review comment:
   Typo: 'imcompatible'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#discussion_r371792046
 
 

 ##########
 File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/SnapshotGenerator.java
 ##########
 @@ -0,0 +1,105 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server;
+
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Util class Used to control the behavior abouthow we take snapshot.
+ */
+public class SnapshotGenerator {
+    private static final Logger LOG = LoggerFactory.getLogger(SnapshotGenerator.class);
+
+    public static final String PURGE_AFTER_SNAPSHOT = "zookeeper.purgeAfterSnapshot.enabled";
+    private static boolean purgeAfterSnapshot;
+
+    public static final String FSYNC_SNAPSHOT_FROM_SCHEDULER = "zookeeper.fsyncSnapshotFromScheduler";
+    private static boolean fsyncSnapshotFromScheduler;
+
+    static {
+        purgeAfterSnapshot = Boolean.getBoolean(PURGE_AFTER_SNAPSHOT);
+        LOG.info("{} = {}", PURGE_AFTER_SNAPSHOT, purgeAfterSnapshot);
+
+        fsyncSnapshotFromScheduler = Boolean.parseBoolean(
+                System.getProperty(FSYNC_SNAPSHOT_FROM_SCHEDULER, "true"));
+        LOG.info("{} = {}", FSYNC_SNAPSHOT_FROM_SCHEDULER, fsyncSnapshotFromScheduler);
+    }
+
+    public static boolean getPurgeAfterSnapshot() {
+        return purgeAfterSnapshot;
+    }
+
+    public static void setPurgeAfterSnapshot(boolean enabled) {
+        purgeAfterSnapshot = enabled;
+        LOG.info("{} = {}", PURGE_AFTER_SNAPSHOT, purgeAfterSnapshot);
+    }
+
+    public static void setFsyncSnapshotFromScheduler(boolean fsync) {
+        fsyncSnapshotFromScheduler = fsync;
+        LOG.info("{} = {}", FSYNC_SNAPSHOT_FROM_SCHEDULER, fsyncSnapshotFromScheduler);
+    }
+
+    public static boolean getFsyncSnapshotFromScheduler() {
+        return fsyncSnapshotFromScheduler;
+    }
+
+    private final ZooKeeperServer zks;
+    private final ExecutorService worker;
+    private final AtomicBoolean isTakingSnapshot;
+
+    public SnapshotGenerator(final ZooKeeperServer zks) {
+        this.zks = zks;
+        this.worker = Executors.newFixedThreadPool(1);
+        this.isTakingSnapshot = new AtomicBoolean(false);
+    }
+
+    public boolean takeSnapshot(boolean syncSnap) {
+        // Only allow a single snapshot in progress.
+        if (isTakingSnapshot.compareAndSet(false, true)) {
+            this.worker.execute(new Runnable() {
+                @Override
+                public void run() {
+                    try {
+                        zks.takeSnapshot(syncSnap);
+                        if (purgeAfterSnapshot) {
+                            zks.purge();
+                        }
+                    } catch (Exception e) {
+                        LOG.warn("Unexpected exception", e);
+                    } finally {
+                        isTakingSnapshot.compareAndSet(true, false);
+                    }
+                }
+            });
+            return true;
+        } else {
+            LOG.warn("Too busy to snap, skipping");
 
 Review comment:
   Is this log message accurate? Getting here means previous snapshot is still running.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] hanm commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
hanm commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#issuecomment-568156227
 
 
   interesting. two quick questions:
   
   * zookeeper has some randomness in today's sync processor when snapshotting already, i am curious to know what's the short comings of existing approach that motivates this change.
   
   * i don't see observers processing snap ping packet in this pull request so only quorum servers will be scheduled by leader for snap shot generation?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] lvfangmin commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
lvfangmin commented on issue #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#issuecomment-578455973
 
 
   @hanm the randomness snapshot could introduce high latency if majority taking snapshot at the same time, when the total DataTree size increasing, it will take longer time to do snapshot, which means it's more likely majority will take snapshot at the same time with longer period. Which will be a problem when running ZK on a single disk driver.
   
   From what we saw in benchmark, the write throughput within SLA for 6GB DataTree size is more than 10X smaller than 100MB DataTree.
   
   That's why we introduced this feature. Observer don't need to handle SNAPPING, since the quorum ack latency is only affected by participants.
   
   @anmolnar this feature is complete, it has been on our prod for more than 6 months. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#discussion_r371788329
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1078,6 +1078,77 @@ property, when available, is noted below.
     effect due to TLS handshake timeout when there are too many in-flight TLS 
     handshakes. Set it to something like 250 is good enough to avoid herd effect.
 
+* *leader.snapPingIntervalInSeconds*
+    (Jave system property only: **zookeeper.leader.snapPingIntervalInSeconds**)
+    Set the interval of snapshot scheduler, this is also the switch for 
+    enabling/disabling snapshot scheduler. 
+    
+    Snapshot scheduler is the feature used to coordinate the time of snapshot 
+    happens in the quorum, which avoid high latency issue due to majority of 
+    servers taking snapshot at the same time when running on a single disk 
+    driver.
+
+    A new quorum packet is added: SNAPPING, but it's backwards compatible and can be 
+    rolled out safely with rolling restart. Leader will check and start the snapshot 
+    scheduler if it's enabled, and send SNAPPING to the quorum. If the follower is 
+    running old code, it will ignore that packet. When follower with new code received 
+    SNAPPING packet, it will turn off the periodically snapshot locally, and only 
+    taking safety snapshot if the if the txns since last snapshot is much larger than 
 
 Review comment:
   Fix typo: 'if the if the'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#discussion_r371788058
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1078,6 +1078,77 @@ property, when available, is noted below.
     effect due to TLS handshake timeout when there are too many in-flight TLS 
     handshakes. Set it to something like 250 is good enough to avoid herd effect.
 
+* *leader.snapPingIntervalInSeconds*
+    (Jave system property only: **zookeeper.leader.snapPingIntervalInSeconds**)
 
 Review comment:
   Fix typo please: 'Jave'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [zookeeper] anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention

Posted by GitBox <gi...@apache.org>.
anmolnar commented on a change in pull request #1191: [ZOOKEEPER-3657] Implementing snapshot schedule to avoid high latency issue due to disk contention
URL: https://github.com/apache/zookeeper/pull/1191#discussion_r371788640
 
 

 ##########
 File path: zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
 ##########
 @@ -1078,6 +1078,77 @@ property, when available, is noted below.
     effect due to TLS handshake timeout when there are too many in-flight TLS 
     handshakes. Set it to something like 250 is good enough to avoid herd effect.
 
+* *leader.snapPingIntervalInSeconds*
+    (Jave system property only: **zookeeper.leader.snapPingIntervalInSeconds**)
+    Set the interval of snapshot scheduler, this is also the switch for 
+    enabling/disabling snapshot scheduler. 
+    
+    Snapshot scheduler is the feature used to coordinate the time of snapshot 
+    happens in the quorum, which avoid high latency issue due to majority of 
+    servers taking snapshot at the same time when running on a single disk 
+    driver.
+
+    A new quorum packet is added: SNAPPING, but it's backwards compatible and can be 
+    rolled out safely with rolling restart. Leader will check and start the snapshot 
+    scheduler if it's enabled, and send SNAPPING to the quorum. If the follower is 
+    running old code, it will ignore that packet. When follower with new code received 
+    SNAPPING packet, it will turn off the periodically snapshot locally, and only 
+    taking safety snapshot if the if the txns since last snapshot is much larger than 
+    the threshold defined in SyncRequestProcessor. This is used to avoid issues like 
+    the follower accumulated too many txns before it is scheduled to take snapshot.
+    
+    The default value is -1, which disables the central snapshot scheduler in 
+    quorum. The suggest value would be 20s, which means it checks and schedule 
+    the next round of snapshot every 20s. Note that each round will only schedule 
+    at most one server to take snapshot.
+
+    Also there is a JMX setting on leader to turn it on and off in flight.
+
+* *leader.snapTxnsThreshold*
+    (Jave system property only: **zookeeper.leader.snapTxnsThreshold**)
+    The minimal number of txns to schedule snapshot since last snapshot. The
+    default value is 100,000 which is the suggested value.
+
+* *leader.snapTxnsSizeThresholdKB*
+    (Jave system property only: **zookeeper.leader.snapTxnsSizeThresholdKB**)
 
 Review comment:
   Typo: 'Jave' (looks like a copy-paste problem)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services