You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2020/07/20 13:25:42 UTC

[GitHub] [zookeeper] maoling opened a new pull request #1406: ZOOKEEPER-3823: Add a benchmark tool for testing watch feature performance

maoling opened a new pull request #1406:
URL: https://github.com/apache/zookeeper/pull/1406


   - [Here](https://docs.google.com/document/d/1PuYQWnvyDudrsQT9htXD02_pBX_e21xf2t-rYBTuXsU/edit?usp=sharing) is a detailed design document. Copying its abstract here.
   	
   
   > This watch benchmark report is mainly organized as followed: Section 1 and 2: we introduce the resource, environment and software settings of our benchmark server. Section 3: we talk about the implementations of our watch benchmark tool. Section 4: we evaluate the experimental effects using our watch benchmark tool on the standard cloud machine from four dimensions: server mode, znode count, client threads, watch mode. Section 6: we look forward to our future work
   
   - Link to [PR-1011](https://github.com/apache/zookeeper/pull/1011)
   - A test evidence in `Windows `OS is included in [JIRA](https://issues.apache.org/jira/browse/ZOOKEEPER-3823)
   - More details in [ZOOKEEPER-3823](https://issues.apache.org/jira/browse/ZOOKEEPER-3823)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [zookeeper] nkalmar commented on a change in pull request #1406: ZOOKEEPER-3823: Add a benchmark tool for testing watch feature performance

Posted by GitBox <gi...@apache.org>.
nkalmar commented on a change in pull request #1406:
URL: https://github.com/apache/zookeeper/pull/1406#discussion_r466563259



##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatchBenchmarkTool.java
##########
@@ -0,0 +1,607 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.watch;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.zookeeper.AddWatchMode.PERSISTENT;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.RoundingMode;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Vector;
+import java.util.concurrent.BrokenBarrierException;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.PosixParser;
+import org.apache.commons.lang.RandomStringUtils;
+import org.apache.commons.lang.StringUtils;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.client.ZKClientConfig;
+import org.apache.zookeeper.common.PathUtils;
+import org.apache.zookeeper.server.quorum.QuorumPeerConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A benchmark tool that benchmarks the watch throughput and latency.
+ * See ZOOKEEPER-3823 for the design document

Review comment:
       again, I think adding the jira number is unnecessary, as it does not contain a design document, or any further information.

##########
File path: zookeeper-docs/src/main/resources/markdown/zookeeperTools.md
##########
@@ -425,6 +426,38 @@ All layers compared.
 
 Or use `^c` to exit interactive mode anytime.
 
+
+<a name="zkWatchBenchmark"></a>
+
+### zkWatchBenchmark.sh
+- A benchmark tool that benchmarks the watch throughput and latency, which supports multiple clients threads, multiple watch modes
+- See **ZOOKEEPER-3823** for the design document

Review comment:
       this jira does not contain a design document. I don't think we need to link jira here.

##########
File path: zookeeper-docs/src/main/resources/markdown/zookeeperTools.md
##########
@@ -425,6 +426,38 @@ All layers compared.
 
 Or use `^c` to exit interactive mode anytime.
 
+
+<a name="zkWatchBenchmark"></a>
+
+### zkWatchBenchmark.sh
+- A benchmark tool that benchmarks the watch throughput and latency, which supports multiple clients threads, multiple watch modes
+- See **ZOOKEEPER-3823** for the design document
+- Notes:
+    - `export JVMFLAGS="-Xms12g -Xmx12g"` to have more JVM heap size to avoid `GC overhead limit exceeded` when you have a large scale testing
+    - `-v` to print a detailed logs when you find no benchmark output result for a long time
+- Usages:
+
+
+        nohup ./zkWatchBenchmark.sh -root_path /bench-watch -threads 100 -znode_count 1000 -force -connect_string \n

Review comment:
       I think nohup is unnecessary to include in the example command (while useful here I agree, there are other options, and everyone can decide how to handle the long runtime - at least this is how I see it, not a blocker though)

##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatchBenchmarkTool.java
##########
@@ -0,0 +1,607 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.watch;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.zookeeper.AddWatchMode.PERSISTENT;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.RoundingMode;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Vector;
+import java.util.concurrent.BrokenBarrierException;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.PosixParser;
+import org.apache.commons.lang.RandomStringUtils;
+import org.apache.commons.lang.StringUtils;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.client.ZKClientConfig;
+import org.apache.zookeeper.common.PathUtils;
+import org.apache.zookeeper.server.quorum.QuorumPeerConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A benchmark tool that benchmarks the watch throughput and latency.
+ * See ZOOKEEPER-3823 for the design document
+ */
+
+public class WatchBenchmarkTool {
+    private static final Logger LOG = LoggerFactory.getLogger(WatchBenchmarkTool.class);
+
+    private static final ConcurrentHashMap<Integer, Long> watchTriggerTimeMap = new ConcurrentHashMap<>();
+    private static final List<Long> latencyList = new Vector<>();
+    private static AtomicLong totalStartTriggerWatchTime = new AtomicLong(0);
+    private static int timeout;
+    private static boolean isDebug;
+
+    private static String rootPath;
+    private static int znodeCount;
+    private static int znodeSize = 1;
+    private static int clientThreads;
+    private static String connectString;
+    private static int sessionTimeout;
+    private static String configFilePath;
+    private static String watchMode;
+    private static int watchMultiple = 1;
+
+    public static void main(String[] args) throws Exception {
+        long totalStartTime = System.currentTimeMillis();
+        Options options = new Options();
+        options.addOption("connect_string", true, "ZooKeeper connectString. Default: 127.0.0.1:2181");
+        options.addOption("root_path", true, "Root Path for creating znodes for the benchmark. Not empty");
+        options.addOption("znode_count", true, "The znode count. Default: 1000");
+        options.addOption("znode_size", true, "The data length of per znode. Default: 1");
+        options.addOption("threads", true, "The client thread number. Default: 1");
+        options.addOption("session_timeout", true, "ZooKeeper sessionTimeout. Default: 40000 ms");
+        options.addOption("force", false, "Force to run the benchmark, even if root_path exists");
+        options.addOption("timeout", true, "Timeout for waiting for all watch events arrival. Default: 10000 ms");
+        options.addOption("client_configuration", true, "Client configuration file to set some special client setting. Default: empty");
+        options.addOption("watch_mode", true, "Watch mode. Optional value is t or p, corresponding to traditional one-off watch or persistent watch. Default: t");
+        options.addOption("watch_multiple", true, "Watch multiple times when enables persistent watch. Default: 1");
+        options.addOption("v", false, "Verbose output, print some logs for debugging");
+        options.addOption("help", false, "Help message");
+
+        CommandLineParser parser = new PosixParser();
+        CommandLine cmd = parser.parse(options, args);
+
+        if (args.length == 0 || cmd.hasOption("help")) {
+            usage(options);
+            System.exit(-1);
+        }
+
+        checkParameters(cmd);
+
+        // submit tasks to thread pool
+        ExecutorService executorService = Executors.newFixedThreadPool(clientThreads);
+        CyclicBarrier createNodeCyclicBarrier = new CyclicBarrier(clientThreads);
+        CyclicBarrier setWatchCyclicBarrier = new CyclicBarrier(clientThreads);
+        CountDownLatch deleteNodeCountDownLatch = new CountDownLatch(clientThreads);
+        CountDownLatch finishWatchCountDownLatch = new CountDownLatch(watchMultiple * clientThreads * znodeCount);
+        CountDownLatch closeClientCountDownLatch = new CountDownLatch(1);
+        AtomicBoolean syncOnce = new AtomicBoolean(false);
+        for (int i = 0; i < clientThreads; i++) {
+            executorService.execute(new WatchClientThread(i, createNodeCyclicBarrier,
+                    setWatchCyclicBarrier, deleteNodeCountDownLatch, finishWatchCountDownLatch, closeClientCountDownLatch, syncOnce));
+        }
+
+        // wait for deleting all nodes
+        long deleteAwaitStart = System.currentTimeMillis();
+        deleteNodeCountDownLatch.await();
+        if (isDebug) {

Review comment:
       For readability, I would name this "isVerbose" or something similar. 

##########
File path: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatchBenchmarkTool.java
##########
@@ -0,0 +1,607 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.watch;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+import static org.apache.zookeeper.AddWatchMode.PERSISTENT;
+import java.io.IOException;
+import java.math.BigDecimal;
+import java.math.RoundingMode;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Vector;
+import java.util.concurrent.BrokenBarrierException;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicLong;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.PosixParser;
+import org.apache.commons.lang.RandomStringUtils;
+import org.apache.commons.lang.StringUtils;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.client.ZKClientConfig;
+import org.apache.zookeeper.common.PathUtils;
+import org.apache.zookeeper.server.quorum.QuorumPeerConfig;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A benchmark tool that benchmarks the watch throughput and latency.
+ * See ZOOKEEPER-3823 for the design document
+ */
+
+public class WatchBenchmarkTool {
+    private static final Logger LOG = LoggerFactory.getLogger(WatchBenchmarkTool.class);
+
+    private static final ConcurrentHashMap<Integer, Long> watchTriggerTimeMap = new ConcurrentHashMap<>();
+    private static final List<Long> latencyList = new Vector<>();
+    private static AtomicLong totalStartTriggerWatchTime = new AtomicLong(0);
+    private static int timeout;
+    private static boolean isDebug;
+
+    private static String rootPath;
+    private static int znodeCount;
+    private static int znodeSize = 1;
+    private static int clientThreads;
+    private static String connectString;
+    private static int sessionTimeout;
+    private static String configFilePath;
+    private static String watchMode;
+    private static int watchMultiple = 1;
+
+    public static void main(String[] args) throws Exception {
+        long totalStartTime = System.currentTimeMillis();
+        Options options = new Options();
+        options.addOption("connect_string", true, "ZooKeeper connectString. Default: 127.0.0.1:2181");
+        options.addOption("root_path", true, "Root Path for creating znodes for the benchmark. Not empty");
+        options.addOption("znode_count", true, "The znode count. Default: 1000");
+        options.addOption("znode_size", true, "The data length of per znode. Default: 1");
+        options.addOption("threads", true, "The client thread number. Default: 1");
+        options.addOption("session_timeout", true, "ZooKeeper sessionTimeout. Default: 40000 ms");
+        options.addOption("force", false, "Force to run the benchmark, even if root_path exists");
+        options.addOption("timeout", true, "Timeout for waiting for all watch events arrival. Default: 10000 ms");
+        options.addOption("client_configuration", true, "Client configuration file to set some special client setting. Default: empty");
+        options.addOption("watch_mode", true, "Watch mode. Optional value is t or p, corresponding to traditional one-off watch or persistent watch. Default: t");
+        options.addOption("watch_multiple", true, "Watch multiple times when enables persistent watch. Default: 1");
+        options.addOption("v", false, "Verbose output, print some logs for debugging");
+        options.addOption("help", false, "Help message");
+
+        CommandLineParser parser = new PosixParser();
+        CommandLine cmd = parser.parse(options, args);
+
+        if (args.length == 0 || cmd.hasOption("help")) {
+            usage(options);
+            System.exit(-1);
+        }
+
+        checkParameters(cmd);
+
+        // submit tasks to thread pool
+        ExecutorService executorService = Executors.newFixedThreadPool(clientThreads);
+        CyclicBarrier createNodeCyclicBarrier = new CyclicBarrier(clientThreads);
+        CyclicBarrier setWatchCyclicBarrier = new CyclicBarrier(clientThreads);
+        CountDownLatch deleteNodeCountDownLatch = new CountDownLatch(clientThreads);
+        CountDownLatch finishWatchCountDownLatch = new CountDownLatch(watchMultiple * clientThreads * znodeCount);
+        CountDownLatch closeClientCountDownLatch = new CountDownLatch(1);
+        AtomicBoolean syncOnce = new AtomicBoolean(false);
+        for (int i = 0; i < clientThreads; i++) {
+            executorService.execute(new WatchClientThread(i, createNodeCyclicBarrier,
+                    setWatchCyclicBarrier, deleteNodeCountDownLatch, finishWatchCountDownLatch, closeClientCountDownLatch, syncOnce));
+        }
+
+        // wait for deleting all nodes
+        long deleteAwaitStart = System.currentTimeMillis();
+        deleteNodeCountDownLatch.await();
+        if (isDebug) {
+            LOG.info("deleteNodeCountDownLatch await time spent: {} ms", (System.currentTimeMillis() - deleteAwaitStart));
+        }
+
+        /** wait for all watch events arrival, especially network latency or overhead workloads
+         *  In most cases, when znodes have been deleted, most of the watch events has been notified
+         */
+        long finishWatchAwaitStart = System.currentTimeMillis();
+        boolean finishAwaitFlag = finishWatchCountDownLatch.await(timeout, TimeUnit.MILLISECONDS);
+        if (isDebug) {
+            LOG.info("finishWatchCountDownLatch await time spent: {} ms, awaitFlag:{}", (System.currentTimeMillis() - finishWatchAwaitStart), finishAwaitFlag);
+        }
+        long latencyListSnapshotSize = latencyList.size();
+        long endTime = System.currentTimeMillis();
+        long totalWatchSpentTime = endTime - totalStartTriggerWatchTime.longValue();
+        if (isDebug) {
+            LOG.info("totalStartTriggerWatchTime: {}, endTime: {}, totalWatchSpentTime: {} ms ", totalStartTriggerWatchTime, endTime, totalWatchSpentTime);
+        }
+
+        // close all the zk clients
+        closeClientCountDownLatch.countDown();
+        // shutdown thread pool
+        shutDownThreadPool(executorService);
+        // show the summary
+        showBenchmarkReport(totalStartTime, totalWatchSpentTime, latencyListSnapshotSize);
+    }
+
+    private static void showBenchmarkReport(long totalStartTime, long totalWatchSpentTime, long latencyListSnapshotSize) {
+        if (latencyListSnapshotSize == 0) {
+            System.out.println("Latency list is empty, cannot show the benchmark report");
+            return;
+        }
+
+        /**
+         * A deep copy of the latencyList, to avoid this situation when we statistics latencyList
+         * at the same time, watch events in flight are added to latencyList concurrently.
+         */
+        List<Long> copyLatencyList = new LinkedList<>();
+        copyLatencyList.addAll(latencyList);
+        // Now, we can clear the latencyList to save the memory
+        latencyList.clear();
+
+        // receive, loss notifications count and ratio summary
+        double receivedRatio = (double) copyLatencyList.size() / (double) (watchMultiple * clientThreads * znodeCount);
+        long lossCount = watchMultiple * clientThreads * znodeCount - copyLatencyList.size();
+        double lossRatio = (double) lossCount / (double) (watchMultiple * clientThreads * znodeCount);
+        System.out.println();
+        System.out.println("Notification expected count: " + (watchMultiple * clientThreads * znodeCount)
+                + ", received count: " + copyLatencyList.size() + " (" + getFormatedDouble(receivedRatio) + ")"
+                + ", loss count: " + lossCount + " (" + getFormatedDouble(lossRatio) + ")");
+
+        // latency distribution
+        printLatencyDistribution(copyLatencyList);
+
+        // throughput
+        double timeInμsPerNotification = (double) (totalWatchSpentTime * 1000) / (double) latencyListSnapshotSize;
+        if (isDebug) {
+            LOG.info("timeInμsPerNotification: {} μs, latencyListSnapshotSize:{}, latencyList.size():{}", timeInμsPerNotification, latencyListSnapshotSize, latencyList.size());
+        }
+        System.out.println("Total time:" + (System.currentTimeMillis() - totalStartTime) + " ms, watch benchmark total time: "
+                + totalWatchSpentTime + " ms, throughput:" + getFormatedDouble(1000 * 1000 / timeInμsPerNotification) + " op/s");
+
+    }
+
+    private static void shutDownThreadPool(ExecutorService executorService) {
+        long shutDownStart = System.currentTimeMillis();
+        executorService.shutdown();
+        while (true) {

Review comment:
       executorService.awaitTermination(); isntead?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org