You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/08/24 08:49:50 UTC

[GitHub] [hbase] lokiore opened a new pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

lokiore opened a new pull request #2299:
URL: https://github.com/apache/hbase/pull/2299


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679605698


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 37s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  4s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 39s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 18s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 32s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 14s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 22s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 22s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 22s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 33s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 285m  6s |  root in the patch failed.  |
   |  |   | 317m 17s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 42654b2c5184 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 6ad73b9668 |
   | Default Java | 1.8.0_232 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/testReport/ |
   | Max. process+thread count | 3842 (vs. ulimit of 12500) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-695826384


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 27s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 36s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   5m 26s |  master passed  |
   | +1 :green_heart: |  compile  |   3m 38s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   7m 52s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 14s |  root in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 30s |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 30s |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 30s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   8m  9s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 20s |  hbase-it in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 19s |  root in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 169m 12s |  root in the patch passed.  |
   |  |   | 209m  0s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux d61ad358be5f 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d41eb4f0f4 |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-root.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-it.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/testReport/ |
   | Max. process+thread count | 6104 (vs. ulimit of 12500) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#discussion_r545872702



##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");
+          throw new RuntimeException(e);
+        }
+      } else {
+        LOG.error("Invalid Options");
+      }
+    } catch (Exception e) {
+      LOG.error("Error while starting ChaosService : " + e);
+    }
+  }
+
+  private static void ChaosServiceStart(Configuration conf, ChaosServiceName serviceName) {
+    switch (serviceName) {
+      case chaosagent:
+        ChaosAgent.stopChaosAgent.set(false);
+        try {
+          Thread t = new Thread(new ChaosAgent(conf,
+            ChaosUtils.getZKQuorum(conf), ChaosUtils.getHostName()));
+          t.start();
+          t.join();

Review comment:
       Sure, this can be follow up.

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());

Review comment:
       @lokiore I this here we can provide better Exception than `assert`. Something like:
   ```
   if (!(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
             ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath()) {
       throw new RuntimeException(xxx); // or something better
   }
   ```

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+
+        case OK:
+          if (children != null) {
+            try {
+
+              LOG.info("Executing each task as a separate thread");
+              List<Thread> tasksList = new ArrayList<>();
+              for (String task : children) {
+                String threadName = agentName + "_" + task;
+                Thread t = new Thread(() -> {

Review comment:
       This can also be follow up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-748478846


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   4m  1s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 37s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 13s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 48s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 41s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 53s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 57s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 50s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 50s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 43s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 58s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 172m 47s |  root in the patch passed.  |
   |  |   | 213m 57s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 0d6cea6dc5eb 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f600856a3b |
   | Default Java | AdoptOpenJDK-11.0.6+10 |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/testReport/ |
   | Max. process+thread count | 6752 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679466941


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 26s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m  0s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 45s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 41s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 15s |  root in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m  1s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 44s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 44s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 41s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 16s |  hbase-it in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 16s |  root in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 166m  9s |  root in the patch passed.  |
   |  |   | 196m  4s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 6b1095699ef1 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 6ad73b9668 |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-root.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-it.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/testReport/ |
   | Max. process+thread count | 6034 (vs. ulimit of 12500) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-732162196


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 49s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   5m  4s |  master passed  |
   | +1 :green_heart: |  compile  |   3m 31s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   8m  6s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   3m 44s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 45s |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 23s |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 23s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   7m 58s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   3m 42s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  |   1m 28s |  root in the patch failed.  |
   |  |   |  44m 27s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux a8bcbe6f5efe 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 1cd8f3cf94 |
   | Default Java | AdoptOpenJDK-11.0.6+10 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/testReport/ |
   | Max. process+thread count | 298 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679414368


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 30s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | -1 :x: |  dupname  |   0m  1s |  The patch has 1 duplicated filenames that differ only in case.  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux cb15ed3d96ca 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 6ad73b9668 |
   | dupname | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/artifact/yetus-general-check/output/dupnames.txt |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
virajjasani commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-695803576


   Let me re-build.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-695803812






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-743932749


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 19s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 46s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 26s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 33s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 29s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 25s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 25s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 33s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 22s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 259m  7s |  root in the patch failed.  |
   |  |   | 293m 34s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 30484f0b948c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d50816fe44 |
   | Default Java | AdoptOpenJDK-1.8.0_232-b09 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/testReport/ |
   | Max. process+thread count | 4431 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679287067


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 45s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  Shelldocs was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 11s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   2m 24s |  master passed  |
   | +1 :green_heart: |  spotbugs  |  10m 56s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | -1 :x: |  mvninstall  |   3m 40s |  root in the patch failed.  |
   | -0 :warning: |  checkstyle  |   2m 11s |  root: The patch generated 25 new + 0 unchanged - 0 fixed = 25 total (was 0)  |
   | -0 :warning: |  shellcheck  |   0m  1s |  The patch generated 37 new + 0 unchanged - 0 fixed = 37 total (was 0)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | -1 :x: |  hadoopcheck  |   3m 15s |  The patch causes 12 errors with Hadoop v3.1.2.  |
   | -1 :x: |  hadoopcheck  |   6m 31s |  The patch causes 12 errors with Hadoop v3.2.1.  |
   | +1 :green_heart: |  spotbugs  |  11m 22s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 26s |  The patch does not generate ASF License warnings.  |
   |  |   |  45m  6s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux dddbab0dee1f 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 6ad73b9668 |
   | mvninstall | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-general-check/output/patch-mvninstall-root.txt |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-general-check/output/diff-checkstyle-root.txt |
   | shellcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-general-check/output/diff-patch-shellcheck.txt |
   | hadoopcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-general-check/output/patch-javac-3.1.2.txt |
   | hadoopcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-general-check/output/patch-javac-3.2.1.txt |
   | Max. process+thread count | 139 (vs. ulimit of 12500) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) shellcheck=0.4.6 spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-743910332


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 47s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  Shelldocs was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 37s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 50s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   2m 30s |  master passed  |
   | +1 :green_heart: |  spotbugs  |  10m 16s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 51s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 15s |  root: The patch generated 16 new + 0 unchanged - 0 fixed = 16 total (was 0)  |
   | -0 :warning: |  shellcheck  |   0m  1s |  The patch generated 26 new + 0 unchanged - 0 fixed = 26 total (was 0)  |
   | -0 :warning: |  whitespace  |   0m  0s |  The patch has 55 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  hadoopcheck  |  19m  6s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.  |
   | +1 :green_heart: |  spotbugs  |  10m 53s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 25s |  The patch does not generate ASF License warnings.  |
   |  |   |  64m 54s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux ce27af7213d4 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d50816fe44 |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/artifact/yetus-general-check/output/diff-checkstyle-root.txt |
   | shellcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/artifact/yetus-general-check/output/diff-patch-shellcheck.txt |
   | whitespace | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/artifact/yetus-general-check/output/whitespace-eol.txt |
   | Max. process+thread count | 122 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/console |
   | versions | git=2.17.1 maven=3.6.3 shellcheck=0.4.6 spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] lokiore commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
lokiore commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-732140244


   > @lokiore Can you also add your design doc under `dev-support/design-docs` as PDF?
   
   @virajjasani I have added Design Doc as PDF as requested.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] lokiore commented on a change in pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
lokiore commented on a change in pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#discussion_r541777881



##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");
+          throw new RuntimeException(e);
+        }
+      } else {
+        LOG.error("Invalid Options");
+      }
+    } catch (Exception e) {
+      LOG.error("Error while starting ChaosService : " + e);
+    }
+  }
+
+  private static void ChaosServiceStart(Configuration conf, ChaosServiceName serviceName) {
+    switch (serviceName) {
+      case chaosagent:
+        ChaosAgent.stopChaosAgent.set(false);
+        try {
+          Thread t = new Thread(new ChaosAgent(conf,
+            ChaosUtils.getZKQuorum(conf), ChaosUtils.getHostName()));
+          t.start();
+          t.join();

Review comment:
       @virajjasani Can I take this change as a follow up, it'll require some changes in whole.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-743950848


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 51s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m  4s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 47s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 37s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 58s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m  1s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 47s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 47s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 31s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 58s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 166m 59s |  root in the patch passed.  |
   |  |   | 204m 34s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 3ebce04fe418 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d50816fe44 |
   | Default Java | AdoptOpenJDK-11.0.6+10 |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/testReport/ |
   | Max. process+thread count | 6490 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-732172237


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 17s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  Shelldocs was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m  3s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   2m 17s |  master passed  |
   | +1 :green_heart: |  spotbugs  |  10m  2s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 11s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 51s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 18s |  root: The patch generated 4 new + 0 unchanged - 0 fixed = 4 total (was 0)  |
   | -0 :warning: |  shellcheck  |   0m  2s |  The patch generated 37 new + 0 unchanged - 0 fixed = 37 total (was 0)  |
   | -0 :warning: |  whitespace  |   0m  0s |  The patch has 55 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  hadoopcheck  |  19m  4s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.  |
   | +1 :green_heart: |  spotbugs  |  10m 50s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 25s |  The patch does not generate ASF License warnings.  |
   |  |   |  62m 48s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 7e24bf67fc30 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 1cd8f3cf94 |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-general-check/output/diff-checkstyle-root.txt |
   | shellcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-general-check/output/diff-patch-shellcheck.txt |
   | whitespace | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-general-check/output/whitespace-eol.txt |
   | Max. process+thread count | 122 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/console |
   | versions | git=2.17.1 maven=3.6.3 shellcheck=0.4.6 spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-695803812


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 29s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | -1 :x: |  dupname  |   0m  0s |  The patch has 1 duplicated filenames that differ only in case.  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.13 Server=19.03.13 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 0259e06aa9e1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d41eb4f0f4 |
   | dupname | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/artifact/yetus-general-check/output/dupnames.txt |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/4/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-732333238


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 42s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 24s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 34s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 28s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 31s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 21s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 290m 46s |  root in the patch failed.  |
   |  |   | 324m 13s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 02110fccdb2c 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 1cd8f3cf94 |
   | Default Java | AdoptOpenJDK-1.8.0_232-b09 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/testReport/ |
   | Max. process+thread count | 4239 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/6/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani merged pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
virajjasani merged pull request #2299:
URL: https://github.com/apache/hbase/pull/2299


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-748461507


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 21s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  shelldocs  |   0m  1s |  Shelldocs was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m  2s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   2m 15s |  master passed  |
   | +1 :green_heart: |  spotbugs  |  10m  0s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 49s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 15s |  root: The patch generated 16 new + 0 unchanged - 0 fixed = 16 total (was 0)  |
   | -0 :warning: |  shellcheck  |   0m  1s |  The patch generated 26 new + 0 unchanged - 0 fixed = 26 total (was 0)  |
   | -0 :warning: |  whitespace  |   0m  0s |  The patch has 55 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  hadoopcheck  |  19m  0s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.  |
   | +1 :green_heart: |  spotbugs  |  11m  8s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 26s |  The patch does not generate ASF License warnings.  |
   |  |   |  62m 54s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 817a729c6f86 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f600856a3b |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/artifact/yetus-general-check/output/diff-checkstyle-root.txt |
   | shellcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/artifact/yetus-general-check/output/diff-patch-shellcheck.txt |
   | whitespace | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/artifact/yetus-general-check/output/whitespace-eol.txt |
   | Max. process+thread count | 123 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/console |
   | versions | git=2.17.1 maven=3.6.3 shellcheck=0.4.6 spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-743938169


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 31s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  Shelldocs was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 41s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 57s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   8m 55s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 30s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m  0s |  root: The patch generated 16 new + 0 unchanged - 0 fixed = 16 total (was 0)  |
   | -0 :warning: |  shellcheck  |   0m  1s |  The patch generated 26 new + 0 unchanged - 0 fixed = 26 total (was 0)  |
   | -0 :warning: |  whitespace  |   0m  0s |  The patch has 55 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  hadoopcheck  |  17m 23s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1 3.3.0.  |
   | +1 :green_heart: |  spotbugs  |   9m 39s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 28s |  The patch does not generate ASF License warnings.  |
   |  |   |  56m  9s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux b1dde5fdd636 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d50816fe44 |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/artifact/yetus-general-check/output/diff-checkstyle-root.txt |
   | shellcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/artifact/yetus-general-check/output/diff-patch-shellcheck.txt |
   | whitespace | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/artifact/yetus-general-check/output/whitespace-eol.txt |
   | Max. process+thread count | 137 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/console |
   | versions | git=2.17.1 maven=3.6.3 shellcheck=0.4.6 spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-743965688


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 27s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 20s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 38s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 20s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 29s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 24s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 36s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 22s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 300m 25s |  root in the patch failed.  |
   |  |   | 333m 46s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux e1f8c75a5cd2 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d50816fe44 |
   | Default Java | AdoptOpenJDK-1.8.0_232-b09 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/testReport/ |
   | Max. process+thread count | 4222 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/8/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679027678


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 44s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +0 :ok: |  shelldocs  |   0m  0s |  Shelldocs was not available.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 25s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 46s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   2m 29s |  master passed  |
   | +1 :green_heart: |  spotbugs  |  11m 35s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 15s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 31s |  the patch passed  |
   | -0 :warning: |  checkstyle  |   2m 32s |  root: The patch generated 78 new + 0 unchanged - 0 fixed = 78 total (was 0)  |
   | -0 :warning: |  shellcheck  |   0m  1s |  The patch generated 37 new + 0 unchanged - 0 fixed = 37 total (was 0)  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  13m 43s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | -1 :x: |  spotbugs  |   8m 39s |  root generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0)  |
   ||| _ Other Tests _ |
   | -1 :x: |  asflicense  |   0m 29s |  The patch generated 7 ASF License warnings.  |
   |  |   |  58m 14s |   |
   
   
   | Reason | Tests |
   |-------:|:------|
   | FindBugs | module:root |
   |  |  Switch statement found in org.apache.hadoop.hbase.chaos.ChaosAgent.process(WatchedEvent) where default case is missing  At ChaosAgent.java:where default case is missing  At ChaosAgent.java:[lines 473-486] |
   |  |  Switch statement found in org.apache.hadoop.hbase.chaos.ChaosAgent$3.processResult(int, String, Object, List) where default case is missing  At ChaosAgent.java:String, Object, List) where default case is missing  At ChaosAgent.java:[lines 259-296] |
   |  |  Unread field:ChaosAgent.java:[line 521] |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | dupname asflicense shellcheck shelldocs spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 8143f460a12e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 2874f00a2f |
   | checkstyle | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-general-check/output/diff-checkstyle-root.txt |
   | shellcheck | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-general-check/output/diff-patch-shellcheck.txt |
   | spotbugs | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-general-check/output/new-spotbugs-root.html |
   | asflicense | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-general-check/output/patch-asflicense-problems.txt |
   | Max. process+thread count | 139 (vs. ulimit of 12500) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) shellcheck=0.4.6 spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
virajjasani commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-732093347


   @lokiore Can you also add your design doc under `dev-support/design-docs` as PDF?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679091049


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 53s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 47s |  master passed  |
   | +1 :green_heart: |  compile  |   3m  6s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   7m  6s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 19s |  root in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 15s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   5m 12s |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 39s |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 39s |  the patch passed  |
   | -1 :x: |  shadedjars  |   0m 11s |  patch has 7 errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 18s |  hbase-it in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 17s |  root in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 174m 21s |  root in the patch passed.  |
   |  |   | 204m 48s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 2e8739c5012b 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 2874f00a2f |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-root.txt |
   | shadedjars | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-jdk11-hadoop3-check/output/patch-shadedjars.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-it.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/testReport/ |
   | Max. process+thread count | 6915 (vs. ulimit of 12500) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#discussion_r529587836



##########
File path: bin/chaos-daemon.sh
##########
@@ -0,0 +1,140 @@
+#!/usr/bin/env bash
+#
+#/**
+# * Licensed to the Apache Software Foundation (ASF) under one
+# * or more contributor license agreements.  See the NOTICE file
+# * distributed with this work for additional information
+# * regarding copyright ownership.  The ASF licenses this file
+# * to you under the Apache License, Version 2.0 (the
+# * "License"); you may not use this file except in compliance
+# * with the License.  You may obtain a copy of the License at
+# *
+# *     http://www.apache.org/licenses/LICENSE-2.0
+# *
+# * Unless required by applicable law or agreed to in writing, software
+# * distributed under the License is distributed on an "AS IS" BASIS,
+# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# * See the License for the specific language governing permissions and
+# * limitations under the License.
+# */
+#
+
+usage="Usage: chaos-daemon.sh (start|stop) chaosagent"
+
+# if no args specified, show usage
+if [ $# -le 1 ]; then
+  echo $usage
+  exit 1
+fi
+
+# get arguments
+startStop=$1
+shift
+
+command=$1
+shift
+
+check_before_start(){
+    #ckeck if the process is not running
+    mkdir -p "$HBASE_PID_DIR"
+    if [ -f $CHAOS_PID ]; then
+      if kill -0 `cat $CHAOS_PID` > /dev/null 2>&1; then
+        echo $command running as process `cat $CHAOS_PID`.  Stop it first.
+        exit 1
+      fi
+    fi
+}

Review comment:
       nit: will this work?
   ```
   check_before_start(){
       #ckeck if the process is not running
       mkdir -p "$HBASE_PID_DIR"
       if [ -f "$CHAOS_PID" ]; then
         if kill -0 "$(cat "$CHAOS_PID")" > /dev/null 2>&1; then
           echo "$command" running as process "$(cat "$CHAOS_PID")".  Stop it first.
           exit 1
         fi
       fi
   }
   ```

##########
File path: bin/chaos-daemon.sh
##########
@@ -0,0 +1,140 @@
+#!/usr/bin/env bash
+#
+#/**
+# * Licensed to the Apache Software Foundation (ASF) under one
+# * or more contributor license agreements.  See the NOTICE file
+# * distributed with this work for additional information
+# * regarding copyright ownership.  The ASF licenses this file
+# * to you under the Apache License, Version 2.0 (the
+# * "License"); you may not use this file except in compliance
+# * with the License.  You may obtain a copy of the License at
+# *
+# *     http://www.apache.org/licenses/LICENSE-2.0
+# *
+# * Unless required by applicable law or agreed to in writing, software
+# * distributed under the License is distributed on an "AS IS" BASIS,
+# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# * See the License for the specific language governing permissions and
+# * limitations under the License.
+# */
+#
+
+usage="Usage: chaos-daemon.sh (start|stop) chaosagent"
+
+# if no args specified, show usage
+if [ $# -le 1 ]; then
+  echo $usage
+  exit 1
+fi
+
+# get arguments
+startStop=$1
+shift
+
+command=$1
+shift
+
+check_before_start(){
+    #ckeck if the process is not running
+    mkdir -p "$HBASE_PID_DIR"
+    if [ -f $CHAOS_PID ]; then
+      if kill -0 `cat $CHAOS_PID` > /dev/null 2>&1; then
+        echo $command running as process `cat $CHAOS_PID`.  Stop it first.
+        exit 1
+      fi
+    fi
+}
+
+bin=`dirname "${BASH_SOURCE-$0}"`
+bin=`cd "$bin">/dev/null; pwd`

Review comment:
       nit: `bin=$(cd "$bin">/dev/null || exit; pwd)`

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");
+          throw new RuntimeException(e);
+        }
+      } else {
+        LOG.error("Invalid Options");
+      }
+    } catch (Exception e) {
+      LOG.error("Error while starting ChaosService : " + e);

Review comment:
       same as above, `e` should be last argument

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");

Review comment:
       nit: let's use placeholders and Exception as argument i.e:
   ```
             LOG.error("action passed: {} . Unexpected action. Please provide only start/stop.",
               actionStr, e);
   ```

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+
+        case OK:
+          if (children != null) {
+            try {
+
+              LOG.info("Executing each task as a separate thread");
+              List<Thread> tasksList = new ArrayList<>();
+              for (String task : children) {
+                String threadName = agentName + "_" + task;
+                Thread t = new Thread(() -> {
+
+                  LOG.info("Executing task : " + task + " of agent : " + agentName);
+                  zk.getData(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + task,
+                    false,
+                    getTaskForExecutionCallback,
+                    task);
+
+                });
+                t.setName(threadName);
+                t.start();
+                tasksList.add(t);
+
+                for (Thread thread : tasksList) {
+                  thread.join();
+                }
+              }
+            } catch (InterruptedException e) {
+              LOG.error("Error scheduling next task : " +
+                " for agent : " + agentName + " Error : " + e);
+            }
+          }
+
+        default:
+          LOG.error("Error occurred while getting task",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Function to create PERSISTENT ZNODE with given path and data given as params
+   * @param path Path at which ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.PERSISTENT,
+      createZNodeCallback,
+      data);
+  }
+
+  /***
+   * Function to create EPHEMERAL ZNODE with given path and data as params.
+   * @param path Path at which Ephemeral ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createEphemeralZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.EPHEMERAL,
+      createEphemeralZNodeCallback,
+      data);
+  }
+
+  /**
+   * Checks if given ZNode exists, if not creates a PERSISTENT ZNODE for same.
+   *
+   * @param path Path to check for ZNode
+   */
+  private void createIfZNodeNotExists(String path) {
+    try {
+      if (zk.exists(path,
+        false) == null) {
+        createZNode(path, new byte[0]);
+      }
+    } catch (KeeperException | InterruptedException e) {
+      LOG.error("Error checking given node : " + path + " " + e);
+    }
+  }
+
+  /**
+   * sets given Status for Task Znode
+   *
+   * @param taskZNode ZNode to set status
+   * @param status Status value
+   */
+  public void setStatusOfTaskZNode(String taskZNode, String status) {
+    LOG.info("Setting status of Task ZNode: " + taskZNode + " status : " + status);
+    zk.setData(taskZNode,
+      status.getBytes(),
+      -1,
+      setStatusOfTaskZNodeCallback,
+      null);
+  }
+
+  /**
+   * registration of ChaosAgent by checking and creating necessary ZNodes.
+   */
+  private void register() {
+    createIfZNodeNotExists(ChaosConstants.CHAOS_TEST_ROOT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName);
+
+    createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+  }
+
+  /***
+   * Gets tasks for execution, basically sets Watch on it's respective host's Znode and
+   * waits for tasks to be assigned, also has a getTasksForAgentCallback
+   * which handles execution of task.
+   */
+  private void getTasks() {
+    LOG.info("Getting Tasks for Agent: " + agentName + "and setting watch for new Tasks");
+    zk.getChildren(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+        ChaosConstants.ZNODE_PATH_SEPARATOR + agentName,
+      newTaskCreatedWatcher,
+      getTasksForAgentCallback,
+      null);
+  }
+
+  /**
+   * Below function executes command with retries with given user.
+   * Uses LocalShell to execute a command.
+   *
+   * @param user user name, default none
+   * @param cmd Command to execute
+   * @return A pair of Exit Code and Shell output
+   * @throws IOException Exception while executing shell command
+   */
+  private Pair<Integer, String> execWithRetries(String user, String cmd) throws IOException {
+    RetryCounter retryCounter = retryCounterFactory.create();
+    while (true) {
+      try {
+        return exec(user, cmd);
+      } catch (IOException e) {
+        retryOrThrow(retryCounter, e, user, cmd);
+      }
+      try {
+        retryCounter.sleepUntilNextRetry();
+      } catch (InterruptedException e) {
+        LOG.warn("Sleep Interrupted: " + e);
+      }
+    }
+  }
+
+  private Pair<Integer, String> exec(String user, String cmd) throws IOException {
+    LOG.info("Executing Shell command: " + cmd + " , user: " + user);
+
+    LocalShell shell = new LocalShell(user, cmd);
+    try {
+      shell.execute();
+    } catch (Shell.ExitCodeException e) {
+      String output = shell.getOutput();
+      throw new Shell.ExitCodeException(e.getExitCode(), "stderr: " + e.getMessage()
+        + ", stdout: " + output);
+    }
+    LOG.info("Executed Shell command, exit code: " + shell.getExitCode() +
+      " , output:" + shell.getOutput());
+
+    return new Pair<>(shell.getExitCode(), shell.getOutput());
+  }
+
+  private <E extends Exception> void retryOrThrow(RetryCounter retryCounter, E ex,
+    String user, String cmd) throws E {
+    if (retryCounter.shouldRetry()) {
+      LOG.warn("Local command: " + cmd + " , user:" + user
+        + " failed at attempt " + retryCounter.getAttemptTimes() + ". Retrying until maxAttempts: "
+        + retryCounter.getMaxAttempts() + ". Exception: " + ex.getMessage());
+      return;
+    }
+    throw ex;
+  }
+
+  private boolean isConnected() {
+    return connected;
+  }
+
+  @Override
+  public void close() throws IOException {
+    LOG.info("Closing ZooKeeper Connection for Chaos Agent : " + agentName);
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error while closing ZooKeeper Connection.");
+    }
+  }
+
+  @Override
+  public void run() {
+    try {
+      LOG.info("Running Chaos Agent on : " + agentName);
+      while (!this.isConnected()) {
+        Thread.sleep(100);
+      }
+      this.getTasks();
+      while (!stopChaosAgent.get()) {
+        Thread.sleep(500);
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error while running Chaos Agent");
+    }
+
+  }
+
+  @Override
+  public void process(WatchedEvent watchedEvent) {
+    LOG.info("Processing event: " + watchedEvent.toString());
+    if (watchedEvent.getType() == Event.EventType.None) {
+      switch (watchedEvent.getState()) {
+        case SyncConnected:
+          connected = true;
+          break;
+        case Disconnected:
+          connected = false;
+          break;
+        case Expired:
+          connected = false;
+          LOG.error("Session expired creating again");
+          try {
+            createZKConnection(null);
+          } catch (IOException e) {
+            LOG.error("Error creating Zookeeper connection");
+          }
+        default:
+          LOG.error("Unknown State");
+          break;
+      }
+    }
+  }
+
+  private void recreateZKConnection() throws Exception{
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error closing ZK connection : " + e);
+      throw new RuntimeException(e) ;
+    } finally {
+      try {
+        createZKConnection(newTaskCreatedWatcher);
+        createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+      } catch (IOException e) {
+        LOG.error("Error creating new ZK COnnection for agent: " + agentName + e);
+        throw new RuntimeException(e);
+      }
+    }
+  }

Review comment:
       Throwing RTE within `finally` might complicate debugging. Let's keep everything in `try` block:
   ```
     private void recreateZKConnection() throws Exception {
       try {
         zk.close();
         createZKConnection(newTaskCreatedWatcher);
         createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
           ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
       } catch (InterruptedException | IOException e) {
         LOG.error("Error recreating new ZK Connection for agent: {}", agentName, e);
         throw e;
       }
     }
   ```

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+
+        case OK:
+          if (children != null) {
+            try {
+
+              LOG.info("Executing each task as a separate thread");
+              List<Thread> tasksList = new ArrayList<>();
+              for (String task : children) {
+                String threadName = agentName + "_" + task;
+                Thread t = new Thread(() -> {
+
+                  LOG.info("Executing task : " + task + " of agent : " + agentName);
+                  zk.getData(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + task,
+                    false,
+                    getTaskForExecutionCallback,
+                    task);
+
+                });
+                t.setName(threadName);
+                t.start();
+                tasksList.add(t);
+
+                for (Thread thread : tasksList) {
+                  thread.join();
+                }
+              }
+            } catch (InterruptedException e) {
+              LOG.error("Error scheduling next task : " +
+                " for agent : " + agentName + " Error : " + e);
+            }
+          }
+
+        default:
+          LOG.error("Error occurred while getting task",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Function to create PERSISTENT ZNODE with given path and data given as params
+   * @param path Path at which ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.PERSISTENT,
+      createZNodeCallback,
+      data);
+  }
+
+  /***
+   * Function to create EPHEMERAL ZNODE with given path and data as params.
+   * @param path Path at which Ephemeral ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createEphemeralZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.EPHEMERAL,
+      createEphemeralZNodeCallback,
+      data);
+  }
+
+  /**
+   * Checks if given ZNode exists, if not creates a PERSISTENT ZNODE for same.
+   *
+   * @param path Path to check for ZNode
+   */
+  private void createIfZNodeNotExists(String path) {
+    try {
+      if (zk.exists(path,
+        false) == null) {
+        createZNode(path, new byte[0]);
+      }
+    } catch (KeeperException | InterruptedException e) {
+      LOG.error("Error checking given node : " + path + " " + e);
+    }
+  }
+
+  /**
+   * sets given Status for Task Znode
+   *
+   * @param taskZNode ZNode to set status
+   * @param status Status value
+   */
+  public void setStatusOfTaskZNode(String taskZNode, String status) {
+    LOG.info("Setting status of Task ZNode: " + taskZNode + " status : " + status);
+    zk.setData(taskZNode,
+      status.getBytes(),
+      -1,
+      setStatusOfTaskZNodeCallback,
+      null);
+  }
+
+  /**
+   * registration of ChaosAgent by checking and creating necessary ZNodes.
+   */
+  private void register() {
+    createIfZNodeNotExists(ChaosConstants.CHAOS_TEST_ROOT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName);
+
+    createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+  }
+
+  /***
+   * Gets tasks for execution, basically sets Watch on it's respective host's Znode and
+   * waits for tasks to be assigned, also has a getTasksForAgentCallback
+   * which handles execution of task.
+   */
+  private void getTasks() {
+    LOG.info("Getting Tasks for Agent: " + agentName + "and setting watch for new Tasks");
+    zk.getChildren(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+        ChaosConstants.ZNODE_PATH_SEPARATOR + agentName,
+      newTaskCreatedWatcher,
+      getTasksForAgentCallback,
+      null);
+  }
+
+  /**
+   * Below function executes command with retries with given user.
+   * Uses LocalShell to execute a command.
+   *
+   * @param user user name, default none
+   * @param cmd Command to execute
+   * @return A pair of Exit Code and Shell output
+   * @throws IOException Exception while executing shell command
+   */
+  private Pair<Integer, String> execWithRetries(String user, String cmd) throws IOException {
+    RetryCounter retryCounter = retryCounterFactory.create();
+    while (true) {
+      try {
+        return exec(user, cmd);
+      } catch (IOException e) {
+        retryOrThrow(retryCounter, e, user, cmd);
+      }
+      try {
+        retryCounter.sleepUntilNextRetry();
+      } catch (InterruptedException e) {
+        LOG.warn("Sleep Interrupted: " + e);
+      }
+    }
+  }
+
+  private Pair<Integer, String> exec(String user, String cmd) throws IOException {
+    LOG.info("Executing Shell command: " + cmd + " , user: " + user);
+
+    LocalShell shell = new LocalShell(user, cmd);
+    try {
+      shell.execute();
+    } catch (Shell.ExitCodeException e) {
+      String output = shell.getOutput();
+      throw new Shell.ExitCodeException(e.getExitCode(), "stderr: " + e.getMessage()
+        + ", stdout: " + output);
+    }
+    LOG.info("Executed Shell command, exit code: " + shell.getExitCode() +
+      " , output:" + shell.getOutput());
+
+    return new Pair<>(shell.getExitCode(), shell.getOutput());
+  }
+
+  private <E extends Exception> void retryOrThrow(RetryCounter retryCounter, E ex,
+    String user, String cmd) throws E {
+    if (retryCounter.shouldRetry()) {
+      LOG.warn("Local command: " + cmd + " , user:" + user
+        + " failed at attempt " + retryCounter.getAttemptTimes() + ". Retrying until maxAttempts: "
+        + retryCounter.getMaxAttempts() + ". Exception: " + ex.getMessage());
+      return;
+    }
+    throw ex;
+  }
+
+  private boolean isConnected() {
+    return connected;
+  }
+
+  @Override
+  public void close() throws IOException {
+    LOG.info("Closing ZooKeeper Connection for Chaos Agent : " + agentName);
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error while closing ZooKeeper Connection.");
+    }
+  }
+
+  @Override
+  public void run() {
+    try {
+      LOG.info("Running Chaos Agent on : " + agentName);
+      while (!this.isConnected()) {
+        Thread.sleep(100);
+      }
+      this.getTasks();
+      while (!stopChaosAgent.get()) {
+        Thread.sleep(500);
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error while running Chaos Agent");
+    }
+
+  }
+
+  @Override
+  public void process(WatchedEvent watchedEvent) {
+    LOG.info("Processing event: " + watchedEvent.toString());
+    if (watchedEvent.getType() == Event.EventType.None) {
+      switch (watchedEvent.getState()) {
+        case SyncConnected:
+          connected = true;
+          break;
+        case Disconnected:
+          connected = false;
+          break;
+        case Expired:
+          connected = false;
+          LOG.error("Session expired creating again");
+          try {
+            createZKConnection(null);
+          } catch (IOException e) {
+            LOG.error("Error creating Zookeeper connection");
+          }
+        default:
+          LOG.error("Unknown State");
+          break;
+      }
+    }
+  }
+
+  private void recreateZKConnection() throws Exception{
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error closing ZK connection : " + e);
+      throw new RuntimeException(e) ;
+    } finally {
+      try {
+        createZKConnection(newTaskCreatedWatcher);
+        createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+      } catch (IOException e) {
+        LOG.error("Error creating new ZK COnnection for agent: " + agentName + e);
+        throw new RuntimeException(e);
+      }
+    }
+  }
+
+  /**
+   * Executes Command locally.
+   */
+  protected class LocalShell extends Shell.ShellCommandExecutor {

Review comment:
       Inner class should be `static`

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing

Review comment:
       nit: `Watchers` ?

##########
File path: bin/chaos-daemon.sh
##########
@@ -0,0 +1,140 @@
+#!/usr/bin/env bash
+#
+#/**
+# * Licensed to the Apache Software Foundation (ASF) under one
+# * or more contributor license agreements.  See the NOTICE file
+# * distributed with this work for additional information
+# * regarding copyright ownership.  The ASF licenses this file
+# * to you under the Apache License, Version 2.0 (the
+# * "License"); you may not use this file except in compliance
+# * with the License.  You may obtain a copy of the License at
+# *
+# *     http://www.apache.org/licenses/LICENSE-2.0
+# *
+# * Unless required by applicable law or agreed to in writing, software
+# * distributed under the License is distributed on an "AS IS" BASIS,
+# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# * See the License for the specific language governing permissions and
+# * limitations under the License.
+# */
+#
+
+usage="Usage: chaos-daemon.sh (start|stop) chaosagent"
+
+# if no args specified, show usage
+if [ $# -le 1 ]; then
+  echo $usage

Review comment:
       nit: `echo "$usage"`?

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");
+          throw new RuntimeException(e);
+        }
+      } else {
+        LOG.error("Invalid Options");
+      }
+    } catch (Exception e) {
+      LOG.error("Error while starting ChaosService : " + e);
+    }
+  }
+
+  private static void ChaosServiceStart(Configuration conf, ChaosServiceName serviceName) {
+    switch (serviceName) {
+      case chaosagent:
+        ChaosAgent.stopChaosAgent.set(false);
+        try {
+          Thread t = new Thread(new ChaosAgent(conf,
+            ChaosUtils.getZKQuorum(conf), ChaosUtils.getHostName()));
+          t.start();
+          t.join();
+        } catch (InterruptedException | UnknownHostException e) {
+          LOG.error("Failed while executing next task execution of ChaosAgent on : " +
+            serviceName + " : " + e);
+        }
+        break;
+      default:
+        LOG.error("Service Name not known : " + serviceName.toString());
+    }
+  }
+
+  private static void ChaosServiceStop() {
+    ChaosAgent.stopChaosAgent.set(true);
+  }
+
+  private static Options getOptions() {
+    Options options = new Options();
+    options.addOption(new Option("c", ChaosServiceName.chaosagent.toString(),
+      true, "expecting a start/stop argument"));
+    options.addOption(new Option("D", ChaosServiceName.GENERIC.toString(),
+      true, "generic D param"));
+    LOG.info(Arrays.toString(new Collection[] { options.getOptions() }));
+    return options;
+  }
+
+  public static void main(String[] args) throws Exception {
+    Configuration conf = HBaseConfiguration.create();
+    new GenericOptionsParser(conf, args);
+
+    ChoreService choreChaosService = null;
+    ScheduledChore authChore = AuthUtil.getAuthChore(conf);
+
+    try {
+      if (authChore != null) {
+        choreChaosService = new ChoreService(ChaosConstants.CHORE_SERVICE_PREFIX);
+        choreChaosService.scheduleChore(authChore);
+      }
+
+      execute(args, conf);
+    } finally {
+      if (authChore != null)
+        choreChaosService.shutdown();
+    }
+  }
+
+  enum ChaosServiceName {
+    chaosagent,
+    GENERIC
+  }
+
+
+  enum ExecutorAction {
+    start,
+    stop

Review comment:
       same as above, should be capitalized.

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());

Review comment:
       nit: `LoggerFactory.getLogger(ChaosAgent.class)` should be enough.

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");
+          throw new RuntimeException(e);
+        }
+      } else {
+        LOG.error("Invalid Options");
+      }
+    } catch (Exception e) {
+      LOG.error("Error while starting ChaosService : " + e);
+    }
+  }
+
+  private static void ChaosServiceStart(Configuration conf, ChaosServiceName serviceName) {
+    switch (serviceName) {
+      case chaosagent:
+        ChaosAgent.stopChaosAgent.set(false);
+        try {
+          Thread t = new Thread(new ChaosAgent(conf,
+            ChaosUtils.getZKQuorum(conf), ChaosUtils.getHostName()));
+          t.start();
+          t.join();
+        } catch (InterruptedException | UnknownHostException e) {
+          LOG.error("Failed while executing next task execution of ChaosAgent on : " +
+            serviceName + " : " + e);
+        }
+        break;
+      default:
+        LOG.error("Service Name not known : " + serviceName.toString());
+    }
+  }
+
+  private static void ChaosServiceStop() {
+    ChaosAgent.stopChaosAgent.set(true);
+  }
+
+  private static Options getOptions() {
+    Options options = new Options();
+    options.addOption(new Option("c", ChaosServiceName.chaosagent.toString(),
+      true, "expecting a start/stop argument"));
+    options.addOption(new Option("D", ChaosServiceName.GENERIC.toString(),
+      true, "generic D param"));
+    LOG.info(Arrays.toString(new Collection[] { options.getOptions() }));
+    return options;
+  }
+
+  public static void main(String[] args) throws Exception {
+    Configuration conf = HBaseConfiguration.create();
+    new GenericOptionsParser(conf, args);
+
+    ChoreService choreChaosService = null;
+    ScheduledChore authChore = AuthUtil.getAuthChore(conf);
+
+    try {
+      if (authChore != null) {
+        choreChaosService = new ChoreService(ChaosConstants.CHORE_SERVICE_PREFIX);
+        choreChaosService.scheduleChore(authChore);
+      }
+
+      execute(args, conf);
+    } finally {
+      if (authChore != null)
+        choreChaosService.shutdown();
+    }
+  }
+
+  enum ChaosServiceName {
+    chaosagent,

Review comment:
       enum values are constants and should always be capitalized. If we need to use small letters, we can use enum with constructors.

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");
+          throw new RuntimeException(e);
+        }
+      } else {
+        LOG.error("Invalid Options");
+      }
+    } catch (Exception e) {
+      LOG.error("Error while starting ChaosService : " + e);
+    }
+  }
+
+  private static void ChaosServiceStart(Configuration conf, ChaosServiceName serviceName) {
+    switch (serviceName) {
+      case chaosagent:
+        ChaosAgent.stopChaosAgent.set(false);
+        try {
+          Thread t = new Thread(new ChaosAgent(conf,
+            ChaosUtils.getZKQuorum(conf), ChaosUtils.getHostName()));
+          t.start();
+          t.join();
+        } catch (InterruptedException | UnknownHostException e) {
+          LOG.error("Failed while executing next task execution of ChaosAgent on : " +
+            serviceName + " : " + e);

Review comment:
       same as above, placeholders + exception as arg

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosService.java
##########
@@ -0,0 +1,138 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.net.UnknownHostException;
+import java.util.Arrays;
+import java.util.Collection;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.AuthUtil;
+import org.apache.hadoop.hbase.ChoreService;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.ScheduledChore;
+import org.apache.hadoop.util.GenericOptionsParser;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.GnuParser;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Option;
+import org.apache.hbase.thirdparty.org.apache.commons.cli.Options;
+
+/**
+ * Class used to start/stop Chaos related services (currently chaosagent)
+ */
+@InterfaceAudience.Private
+public class ChaosService {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosService.class.getName());
+
+  public static void execute(String[] args, Configuration conf) {
+    LOG.info("arguments : " + Arrays.toString(args));
+
+    try {
+      CommandLine cmdline = new GnuParser().parse(getOptions(), args);
+      if (cmdline.hasOption(ChaosServiceName.chaosagent.toString())) {
+        String actionStr = cmdline.getOptionValue(ChaosServiceName.chaosagent.toString());
+        try {
+          ExecutorAction action = ExecutorAction.valueOf(actionStr.toLowerCase());
+          if (action == ExecutorAction.start) {
+            ChaosServiceStart(conf, ChaosServiceName.chaosagent);
+          } else if (action == ExecutorAction.stop) {
+            ChaosServiceStop();
+          }
+        } catch (IllegalArgumentException e) {
+          LOG.error("action passed:" + actionStr +
+            " . Unexpected action. Please provide only start/stop.");
+          throw new RuntimeException(e);
+        }
+      } else {
+        LOG.error("Invalid Options");
+      }
+    } catch (Exception e) {
+      LOG.error("Error while starting ChaosService : " + e);
+    }
+  }
+
+  private static void ChaosServiceStart(Configuration conf, ChaosServiceName serviceName) {
+    switch (serviceName) {
+      case chaosagent:
+        ChaosAgent.stopChaosAgent.set(false);
+        try {
+          Thread t = new Thread(new ChaosAgent(conf,
+            ChaosUtils.getZKQuorum(conf), ChaosUtils.getHostName()));
+          t.start();
+          t.join();

Review comment:
       Can you explore using `ExecutorService` here with singleThreadExecutor builder? In order to block on `ChaosAgent` execution, this might require `ChaosAgent` to implement `Callable` instead of `Runnable`.
   If this is too much of change, it's ok to leave as is but having ExecutorService is definitely beneficial than executing single Thread without any pool.

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());

Review comment:
       instead of this assert, let's have an if condition to assert this? if the condition is false, we exit and don't print log, get tasks etc

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+
+        case OK:
+          if (children != null) {
+            try {
+
+              LOG.info("Executing each task as a separate thread");
+              List<Thread> tasksList = new ArrayList<>();
+              for (String task : children) {
+                String threadName = agentName + "_" + task;
+                Thread t = new Thread(() -> {
+
+                  LOG.info("Executing task : " + task + " of agent : " + agentName);
+                  zk.getData(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + task,
+                    false,
+                    getTaskForExecutionCallback,
+                    task);
+
+                });
+                t.setName(threadName);
+                t.start();
+                tasksList.add(t);
+
+                for (Thread thread : tasksList) {
+                  thread.join();
+                }
+              }
+            } catch (InterruptedException e) {
+              LOG.error("Error scheduling next task : " +
+                " for agent : " + agentName + " Error : " + e);
+            }
+          }
+
+        default:
+          LOG.error("Error occurred while getting task",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Function to create PERSISTENT ZNODE with given path and data given as params
+   * @param path Path at which ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.PERSISTENT,
+      createZNodeCallback,
+      data);
+  }
+
+  /***
+   * Function to create EPHEMERAL ZNODE with given path and data as params.
+   * @param path Path at which Ephemeral ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createEphemeralZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.EPHEMERAL,
+      createEphemeralZNodeCallback,
+      data);
+  }
+
+  /**
+   * Checks if given ZNode exists, if not creates a PERSISTENT ZNODE for same.
+   *
+   * @param path Path to check for ZNode
+   */
+  private void createIfZNodeNotExists(String path) {
+    try {
+      if (zk.exists(path,
+        false) == null) {
+        createZNode(path, new byte[0]);
+      }
+    } catch (KeeperException | InterruptedException e) {
+      LOG.error("Error checking given node : " + path + " " + e);
+    }
+  }
+
+  /**
+   * sets given Status for Task Znode
+   *
+   * @param taskZNode ZNode to set status
+   * @param status Status value
+   */
+  public void setStatusOfTaskZNode(String taskZNode, String status) {
+    LOG.info("Setting status of Task ZNode: " + taskZNode + " status : " + status);
+    zk.setData(taskZNode,
+      status.getBytes(),
+      -1,
+      setStatusOfTaskZNodeCallback,
+      null);
+  }
+
+  /**
+   * registration of ChaosAgent by checking and creating necessary ZNodes.
+   */
+  private void register() {
+    createIfZNodeNotExists(ChaosConstants.CHAOS_TEST_ROOT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName);
+
+    createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+  }
+
+  /***
+   * Gets tasks for execution, basically sets Watch on it's respective host's Znode and
+   * waits for tasks to be assigned, also has a getTasksForAgentCallback
+   * which handles execution of task.
+   */
+  private void getTasks() {
+    LOG.info("Getting Tasks for Agent: " + agentName + "and setting watch for new Tasks");
+    zk.getChildren(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+        ChaosConstants.ZNODE_PATH_SEPARATOR + agentName,
+      newTaskCreatedWatcher,
+      getTasksForAgentCallback,
+      null);
+  }
+
+  /**
+   * Below function executes command with retries with given user.
+   * Uses LocalShell to execute a command.
+   *
+   * @param user user name, default none
+   * @param cmd Command to execute
+   * @return A pair of Exit Code and Shell output
+   * @throws IOException Exception while executing shell command
+   */
+  private Pair<Integer, String> execWithRetries(String user, String cmd) throws IOException {
+    RetryCounter retryCounter = retryCounterFactory.create();
+    while (true) {
+      try {
+        return exec(user, cmd);
+      } catch (IOException e) {
+        retryOrThrow(retryCounter, e, user, cmd);
+      }
+      try {
+        retryCounter.sleepUntilNextRetry();
+      } catch (InterruptedException e) {
+        LOG.warn("Sleep Interrupted: " + e);
+      }
+    }
+  }
+
+  private Pair<Integer, String> exec(String user, String cmd) throws IOException {
+    LOG.info("Executing Shell command: " + cmd + " , user: " + user);
+
+    LocalShell shell = new LocalShell(user, cmd);
+    try {
+      shell.execute();
+    } catch (Shell.ExitCodeException e) {
+      String output = shell.getOutput();
+      throw new Shell.ExitCodeException(e.getExitCode(), "stderr: " + e.getMessage()
+        + ", stdout: " + output);
+    }
+    LOG.info("Executed Shell command, exit code: " + shell.getExitCode() +
+      " , output:" + shell.getOutput());
+
+    return new Pair<>(shell.getExitCode(), shell.getOutput());
+  }
+
+  private <E extends Exception> void retryOrThrow(RetryCounter retryCounter, E ex,
+    String user, String cmd) throws E {
+    if (retryCounter.shouldRetry()) {
+      LOG.warn("Local command: " + cmd + " , user:" + user
+        + " failed at attempt " + retryCounter.getAttemptTimes() + ". Retrying until maxAttempts: "
+        + retryCounter.getMaxAttempts() + ". Exception: " + ex.getMessage());

Review comment:
       same as other log comments: placeholders + exception as last argument

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+
+        case OK:
+          if (children != null) {
+            try {
+
+              LOG.info("Executing each task as a separate thread");
+              List<Thread> tasksList = new ArrayList<>();
+              for (String task : children) {
+                String threadName = agentName + "_" + task;
+                Thread t = new Thread(() -> {
+
+                  LOG.info("Executing task : " + task + " of agent : " + agentName);
+                  zk.getData(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + task,
+                    false,
+                    getTaskForExecutionCallback,
+                    task);
+
+                });
+                t.setName(threadName);
+                t.start();
+                tasksList.add(t);
+
+                for (Thread thread : tasksList) {
+                  thread.join();
+                }
+              }
+            } catch (InterruptedException e) {
+              LOG.error("Error scheduling next task : " +
+                " for agent : " + agentName + " Error : " + e);
+            }
+          }
+
+        default:
+          LOG.error("Error occurred while getting task",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Function to create PERSISTENT ZNODE with given path and data given as params
+   * @param path Path at which ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.PERSISTENT,
+      createZNodeCallback,
+      data);
+  }
+
+  /***
+   * Function to create EPHEMERAL ZNODE with given path and data as params.
+   * @param path Path at which Ephemeral ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createEphemeralZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.EPHEMERAL,
+      createEphemeralZNodeCallback,
+      data);
+  }
+
+  /**
+   * Checks if given ZNode exists, if not creates a PERSISTENT ZNODE for same.
+   *
+   * @param path Path to check for ZNode
+   */
+  private void createIfZNodeNotExists(String path) {
+    try {
+      if (zk.exists(path,
+        false) == null) {
+        createZNode(path, new byte[0]);
+      }
+    } catch (KeeperException | InterruptedException e) {
+      LOG.error("Error checking given node : " + path + " " + e);
+    }
+  }
+
+  /**
+   * sets given Status for Task Znode
+   *
+   * @param taskZNode ZNode to set status
+   * @param status Status value
+   */
+  public void setStatusOfTaskZNode(String taskZNode, String status) {
+    LOG.info("Setting status of Task ZNode: " + taskZNode + " status : " + status);
+    zk.setData(taskZNode,
+      status.getBytes(),
+      -1,
+      setStatusOfTaskZNodeCallback,
+      null);
+  }
+
+  /**
+   * registration of ChaosAgent by checking and creating necessary ZNodes.
+   */
+  private void register() {
+    createIfZNodeNotExists(ChaosConstants.CHAOS_TEST_ROOT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName);
+
+    createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+  }
+
+  /***
+   * Gets tasks for execution, basically sets Watch on it's respective host's Znode and
+   * waits for tasks to be assigned, also has a getTasksForAgentCallback
+   * which handles execution of task.
+   */
+  private void getTasks() {
+    LOG.info("Getting Tasks for Agent: " + agentName + "and setting watch for new Tasks");
+    zk.getChildren(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+        ChaosConstants.ZNODE_PATH_SEPARATOR + agentName,
+      newTaskCreatedWatcher,
+      getTasksForAgentCallback,
+      null);
+  }
+
+  /**
+   * Below function executes command with retries with given user.
+   * Uses LocalShell to execute a command.
+   *
+   * @param user user name, default none
+   * @param cmd Command to execute
+   * @return A pair of Exit Code and Shell output
+   * @throws IOException Exception while executing shell command
+   */
+  private Pair<Integer, String> execWithRetries(String user, String cmd) throws IOException {
+    RetryCounter retryCounter = retryCounterFactory.create();
+    while (true) {
+      try {
+        return exec(user, cmd);
+      } catch (IOException e) {
+        retryOrThrow(retryCounter, e, user, cmd);
+      }
+      try {
+        retryCounter.sleepUntilNextRetry();
+      } catch (InterruptedException e) {
+        LOG.warn("Sleep Interrupted: " + e);
+      }
+    }
+  }
+
+  private Pair<Integer, String> exec(String user, String cmd) throws IOException {
+    LOG.info("Executing Shell command: " + cmd + " , user: " + user);
+
+    LocalShell shell = new LocalShell(user, cmd);
+    try {
+      shell.execute();
+    } catch (Shell.ExitCodeException e) {
+      String output = shell.getOutput();
+      throw new Shell.ExitCodeException(e.getExitCode(), "stderr: " + e.getMessage()
+        + ", stdout: " + output);
+    }
+    LOG.info("Executed Shell command, exit code: " + shell.getExitCode() +
+      " , output:" + shell.getOutput());
+
+    return new Pair<>(shell.getExitCode(), shell.getOutput());
+  }
+
+  private <E extends Exception> void retryOrThrow(RetryCounter retryCounter, E ex,
+    String user, String cmd) throws E {
+    if (retryCounter.shouldRetry()) {
+      LOG.warn("Local command: " + cmd + " , user:" + user
+        + " failed at attempt " + retryCounter.getAttemptTimes() + ". Retrying until maxAttempts: "
+        + retryCounter.getMaxAttempts() + ". Exception: " + ex.getMessage());
+      return;
+    }
+    throw ex;
+  }
+
+  private boolean isConnected() {
+    return connected;
+  }
+
+  @Override
+  public void close() throws IOException {
+    LOG.info("Closing ZooKeeper Connection for Chaos Agent : " + agentName);
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error while closing ZooKeeper Connection.");
+    }
+  }
+
+  @Override
+  public void run() {
+    try {
+      LOG.info("Running Chaos Agent on : " + agentName);
+      while (!this.isConnected()) {
+        Thread.sleep(100);
+      }
+      this.getTasks();
+      while (!stopChaosAgent.get()) {
+        Thread.sleep(500);
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error while running Chaos Agent");

Review comment:
       please provide `e` as argument for stacktrace

##########
File path: hbase-it/src/test/java/org/apache/hadoop/hbase/ChaosZKClient.java
##########
@@ -0,0 +1,338 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase;
+
+import java.io.IOException;
+
+import org.apache.hadoop.hbase.util.Threads;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@InterfaceAudience.Private
+public class ChaosZKClient {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosZKClient.class.getName());
+  private static final String CHAOS_AGENT_PARENT_ZNODE = "/hbase/chaosAgents";
+  private static final String CHAOS_AGENT_STATUS_ZNODE = "/hbase/chaosAgentTaskStatus";
+  private static final String ZNODE_PATH_SEPARATOR = "/";
+  private static final String TASK_PREFIX = "task_";
+  private static final String TASK_ERROR_STRING = "error";
+  private static final String TASK_COMPLETION_STRING = "done";
+  private static final String TASK_BOOLEAN_TRUE = "true";
+  private static final String TASK_BOOLEAN_FALSE = "false";
+  private static final String CONNECTION_LOSS = "ConnectionLoss";
+  private static final int SESSION_TIMEOUT_ZK = 10 * 60 * 1000;
+  private static final int TASK_EXECUTION_TIMEOUT = 5 * 60 * 1000;
+  private volatile String taskStatus = null;
+
+  private String quorum;

Review comment:
       nit: keep it `final`?

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+

Review comment:
       Using `{ }` would better differentiate the boundaries of each `case`:
   ```
           case CONNECTIONLOSS: {
           }
           case OK: {
           }
   ```

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+
+        case OK:
+          if (children != null) {
+            try {
+
+              LOG.info("Executing each task as a separate thread");
+              List<Thread> tasksList = new ArrayList<>();
+              for (String task : children) {
+                String threadName = agentName + "_" + task;
+                Thread t = new Thread(() -> {

Review comment:
       Same here, if we can pass `Callable` to `ExecutorServer` and get blocked by `Future.get()` on each submission of callable impl, it would be great option.

##########
File path: hbase-it/src/test/java/org/apache/hadoop/hbase/ChaosZKClient.java
##########
@@ -0,0 +1,338 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase;
+
+import java.io.IOException;
+
+import org.apache.hadoop.hbase.util.Threads;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@InterfaceAudience.Private
+public class ChaosZKClient {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosZKClient.class.getName());
+  private static final String CHAOS_AGENT_PARENT_ZNODE = "/hbase/chaosAgents";
+  private static final String CHAOS_AGENT_STATUS_ZNODE = "/hbase/chaosAgentTaskStatus";
+  private static final String ZNODE_PATH_SEPARATOR = "/";
+  private static final String TASK_PREFIX = "task_";
+  private static final String TASK_ERROR_STRING = "error";
+  private static final String TASK_COMPLETION_STRING = "done";
+  private static final String TASK_BOOLEAN_TRUE = "true";
+  private static final String TASK_BOOLEAN_FALSE = "false";
+  private static final String CONNECTION_LOSS = "ConnectionLoss";
+  private static final int SESSION_TIMEOUT_ZK = 10 * 60 * 1000;
+  private static final int TASK_EXECUTION_TIMEOUT = 5 * 60 * 1000;
+  private volatile String taskStatus = null;
+
+  private String quorum;
+  private ZooKeeper zk;
+
+  public ChaosZKClient(String quorum) {
+    this.quorum = quorum;
+    try {
+      this.createNewZKConnection();
+    } catch (IOException e) {
+      LOG.error("Error creating ZooKeeper Connection: " + e);

Review comment:
       Exception as last argument

##########
File path: hbase-it/src/test/java/org/apache/hadoop/hbase/ChaosZKClient.java
##########
@@ -0,0 +1,338 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase;
+
+import java.io.IOException;
+
+import org.apache.hadoop.hbase.util.Threads;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@InterfaceAudience.Private
+public class ChaosZKClient {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosZKClient.class.getName());
+  private static final String CHAOS_AGENT_PARENT_ZNODE = "/hbase/chaosAgents";
+  private static final String CHAOS_AGENT_STATUS_ZNODE = "/hbase/chaosAgentTaskStatus";
+  private static final String ZNODE_PATH_SEPARATOR = "/";
+  private static final String TASK_PREFIX = "task_";
+  private static final String TASK_ERROR_STRING = "error";
+  private static final String TASK_COMPLETION_STRING = "done";
+  private static final String TASK_BOOLEAN_TRUE = "true";
+  private static final String TASK_BOOLEAN_FALSE = "false";
+  private static final String CONNECTION_LOSS = "ConnectionLoss";
+  private static final int SESSION_TIMEOUT_ZK = 10 * 60 * 1000;
+  private static final int TASK_EXECUTION_TIMEOUT = 5 * 60 * 1000;
+  private volatile String taskStatus = null;
+
+  private String quorum;
+  private ZooKeeper zk;
+
+  public ChaosZKClient(String quorum) {
+    this.quorum = quorum;
+    try {
+      this.createNewZKConnection();
+    } catch (IOException e) {
+      LOG.error("Error creating ZooKeeper Connection: " + e);
+    }
+  }
+
+  /**
+   * Creates connection with ZooKeeper
+   * @throws IOException when not able to create connection properly
+   */
+  public void createNewZKConnection() throws IOException {
+    Watcher watcher = new Watcher() {
+      @Override
+      public void process(WatchedEvent watchedEvent) {
+        LOG.info("Created ZooKeeper Connection For executing task");
+      }
+    };
+
+    this.zk = new ZooKeeper(quorum, SESSION_TIMEOUT_ZK, watcher);
+  }
+
+  /**
+   * Checks if ChaosAgent is running or not on target host by checking its ZNode.
+   * @param hostname hostname to check for chaosagent
+   * @return true/false whether agent is running or not
+   */
+  private boolean isChaosAgentRunning(String hostname) {
+    try {
+      return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+        false) != null;
+    } catch (KeeperException e) {
+      if (e.toString().contains(CONNECTION_LOSS)) {
+        recreateZKConnection();
+        try {
+          return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+            false) != null;
+        } catch (KeeperException  | InterruptedException ie) {
+          LOG.error("ERROR " + ie);
+        }
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error checking for given hostname: " + hostname + " ERROR: " + e);
+    }
+    return false;
+  }
+
+  /**
+   * Creates tasks for target hosts by creating ZNodes.
+   * Waits for a limited amount of time to complete task to execute.
+   * @param taskObject Object data represents command
+   * @return returns status
+   */
+  public String submitTask(final TaskObject taskObject) {
+    if (isChaosAgentRunning(taskObject.getTaskHostname())) {
+      LOG.info("Creating task node");
+      zk.create(CHAOS_AGENT_STATUS_ZNODE + ZNODE_PATH_SEPARATOR +
+          taskObject.getTaskHostname() + ZNODE_PATH_SEPARATOR + TASK_PREFIX,
+        taskObject.getCommand().getBytes(),
+        ZooDefs.Ids.OPEN_ACL_UNSAFE,
+        CreateMode.EPHEMERAL_SEQUENTIAL,
+        submitTaskCallback,
+        taskObject);
+      long start = System.currentTimeMillis();
+
+      while ((System.currentTimeMillis() - start) < TASK_EXECUTION_TIMEOUT) {
+        if(taskStatus != null) {
+          return taskStatus;
+        }
+        Threads.sleep(500);
+      }
+    } else {
+      LOG.info("EHHHHH!  ChaosAgent Not running");
+    }
+    return TASK_ERROR_STRING;
+  }
+
+  /**
+   * To get status of task submitted
+   * @param path path at which to get status
+   * @param ctx path context
+   */
+  private void getStatus(String path , Object ctx) {
+    LOG.info("Getting Status of task: " + path);
+    zk.getData(path,
+      false,
+      getStatusCallback,
+      ctx);
+  }
+
+  /**
+   * Set a watch on task submitted
+   * @param name ZNode name to set a watch
+   * @param taskObject context for ZNode name
+   */
+  private void setStatusWatch(String name, TaskObject taskObject) {
+    LOG.info("Checking for ZNode and Setting watch for task : " + name);
+    zk.exists(name,
+      setStatusWatcher,
+      setStatusWatchCallback,
+      taskObject);
+  }
+
+  /**
+   * Delete task after getting its status
+   * @param path path to delete ZNode
+   */
+  private void deleteTask(String path) {
+    LOG.info("Deleting task: " + path);
+    zk.delete(path,
+      -1,
+      taskDeleteCallback,
+      null);
+  }
+
+  //WATCHERS:
+
+  /**
+   * Watcher to get notification whenever status of task changes.
+   */
+  Watcher setStatusWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      LOG.info("Setting status watch for task: " + watchedEvent.getPath());
+      if(watchedEvent.getType() == Event.EventType.NodeDataChanged) {
+        assert watchedEvent.getPath().contains(TASK_PREFIX);
+        getStatus(watchedEvent.getPath(), (Object) watchedEvent.getPath());
+
+      }
+    }
+  };
+
+  //CALLBACKS
+
+  AsyncCallback.DataCallback getStatusCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while getting status of task, getting again
+          recreateZKConnection();
+          getStatus(path, ctx);
+          break;
+
+        case OK:
+          if (ctx!=null) {
+
+            String status = new String(data);
+            taskStatus = status;
+            switch (status) {
+              case TASK_COMPLETION_STRING:
+              case TASK_BOOLEAN_TRUE:
+              case TASK_BOOLEAN_FALSE:
+                LOG.info("Task executed completely : Status --> " + status);
+                break;
+
+              case TASK_ERROR_STRING:
+                LOG.info("There was error while executing task : Status --> " + status);
+                break;
+
+              default:
+                LOG.warn("Status of task is undefined!! : Status --> " + status);
+            }
+
+            deleteTask(path);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while getting status of task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StatCallback setStatusWatchCallback = new AsyncCallback.StatCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //ConnectionLoss while setting watch on status ZNode, setting again.
+          recreateZKConnection();
+          setStatusWatch(path, (TaskObject) ctx);
+          break;
+
+        case OK:
+          if(stat != null) {
+            getStatus(path, null);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while setting watch on task ZNode: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StringCallback submitTaskCallback = new AsyncCallback.StringCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, String name) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to server was lost while submitting task, submitting again.
+          recreateZKConnection();
+          submitTask((TaskObject) ctx);
+          break;
+
+        case OK:
+          LOG.info("Task created : " + name);
+          setStatusWatch(name, (TaskObject) ctx);
+          break;
+
+        default:
+          LOG.error("Error submitting task: " + name + " ERROR:" +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.VoidCallback taskDeleteCallback = new AsyncCallback.VoidCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while deleting task, deleting again
+          recreateZKConnection();
+          deleteTask(path);
+          break;
+
+        case OK:
+          LOG.info("Task Deleted successfully!");
+          LOG.info("Closing ZooKeeper Connection");
+          try {
+            zk.close();
+          } catch (InterruptedException e) {
+            LOG.error("Error while closing ZooKeeper Connection.");
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while deleting task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+
+  private void recreateZKConnection() {
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error closing ZK connection : " + e);

Review comment:
       same as others: log should have Exception as arg

##########
File path: hbase-it/src/test/java/org/apache/hadoop/hbase/ChaosZKClient.java
##########
@@ -0,0 +1,338 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase;
+
+import java.io.IOException;
+
+import org.apache.hadoop.hbase.util.Threads;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@InterfaceAudience.Private
+public class ChaosZKClient {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosZKClient.class.getName());
+  private static final String CHAOS_AGENT_PARENT_ZNODE = "/hbase/chaosAgents";
+  private static final String CHAOS_AGENT_STATUS_ZNODE = "/hbase/chaosAgentTaskStatus";
+  private static final String ZNODE_PATH_SEPARATOR = "/";
+  private static final String TASK_PREFIX = "task_";
+  private static final String TASK_ERROR_STRING = "error";
+  private static final String TASK_COMPLETION_STRING = "done";
+  private static final String TASK_BOOLEAN_TRUE = "true";
+  private static final String TASK_BOOLEAN_FALSE = "false";
+  private static final String CONNECTION_LOSS = "ConnectionLoss";
+  private static final int SESSION_TIMEOUT_ZK = 10 * 60 * 1000;
+  private static final int TASK_EXECUTION_TIMEOUT = 5 * 60 * 1000;
+  private volatile String taskStatus = null;
+
+  private String quorum;
+  private ZooKeeper zk;
+
+  public ChaosZKClient(String quorum) {
+    this.quorum = quorum;
+    try {
+      this.createNewZKConnection();
+    } catch (IOException e) {
+      LOG.error("Error creating ZooKeeper Connection: " + e);
+    }
+  }
+
+  /**
+   * Creates connection with ZooKeeper
+   * @throws IOException when not able to create connection properly
+   */
+  public void createNewZKConnection() throws IOException {
+    Watcher watcher = new Watcher() {
+      @Override
+      public void process(WatchedEvent watchedEvent) {
+        LOG.info("Created ZooKeeper Connection For executing task");
+      }
+    };
+
+    this.zk = new ZooKeeper(quorum, SESSION_TIMEOUT_ZK, watcher);
+  }
+
+  /**
+   * Checks if ChaosAgent is running or not on target host by checking its ZNode.
+   * @param hostname hostname to check for chaosagent
+   * @return true/false whether agent is running or not
+   */
+  private boolean isChaosAgentRunning(String hostname) {
+    try {
+      return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+        false) != null;
+    } catch (KeeperException e) {
+      if (e.toString().contains(CONNECTION_LOSS)) {
+        recreateZKConnection();
+        try {
+          return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+            false) != null;
+        } catch (KeeperException  | InterruptedException ie) {
+          LOG.error("ERROR " + ie);
+        }
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error checking for given hostname: " + hostname + " ERROR: " + e);
+    }
+    return false;
+  }
+
+  /**
+   * Creates tasks for target hosts by creating ZNodes.
+   * Waits for a limited amount of time to complete task to execute.
+   * @param taskObject Object data represents command
+   * @return returns status
+   */
+  public String submitTask(final TaskObject taskObject) {
+    if (isChaosAgentRunning(taskObject.getTaskHostname())) {
+      LOG.info("Creating task node");
+      zk.create(CHAOS_AGENT_STATUS_ZNODE + ZNODE_PATH_SEPARATOR +
+          taskObject.getTaskHostname() + ZNODE_PATH_SEPARATOR + TASK_PREFIX,
+        taskObject.getCommand().getBytes(),
+        ZooDefs.Ids.OPEN_ACL_UNSAFE,
+        CreateMode.EPHEMERAL_SEQUENTIAL,
+        submitTaskCallback,
+        taskObject);
+      long start = System.currentTimeMillis();
+
+      while ((System.currentTimeMillis() - start) < TASK_EXECUTION_TIMEOUT) {
+        if(taskStatus != null) {
+          return taskStatus;
+        }
+        Threads.sleep(500);
+      }
+    } else {
+      LOG.info("EHHHHH!  ChaosAgent Not running");
+    }
+    return TASK_ERROR_STRING;
+  }
+
+  /**
+   * To get status of task submitted
+   * @param path path at which to get status
+   * @param ctx path context
+   */
+  private void getStatus(String path , Object ctx) {
+    LOG.info("Getting Status of task: " + path);
+    zk.getData(path,
+      false,
+      getStatusCallback,
+      ctx);
+  }
+
+  /**
+   * Set a watch on task submitted
+   * @param name ZNode name to set a watch
+   * @param taskObject context for ZNode name
+   */
+  private void setStatusWatch(String name, TaskObject taskObject) {
+    LOG.info("Checking for ZNode and Setting watch for task : " + name);
+    zk.exists(name,
+      setStatusWatcher,
+      setStatusWatchCallback,
+      taskObject);
+  }
+
+  /**
+   * Delete task after getting its status
+   * @param path path to delete ZNode
+   */
+  private void deleteTask(String path) {
+    LOG.info("Deleting task: " + path);
+    zk.delete(path,
+      -1,
+      taskDeleteCallback,
+      null);
+  }
+
+  //WATCHERS:
+
+  /**
+   * Watcher to get notification whenever status of task changes.
+   */
+  Watcher setStatusWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      LOG.info("Setting status watch for task: " + watchedEvent.getPath());
+      if(watchedEvent.getType() == Event.EventType.NodeDataChanged) {
+        assert watchedEvent.getPath().contains(TASK_PREFIX);
+        getStatus(watchedEvent.getPath(), (Object) watchedEvent.getPath());
+
+      }
+    }
+  };
+
+  //CALLBACKS
+
+  AsyncCallback.DataCallback getStatusCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while getting status of task, getting again
+          recreateZKConnection();
+          getStatus(path, ctx);
+          break;
+
+        case OK:
+          if (ctx!=null) {
+
+            String status = new String(data);
+            taskStatus = status;
+            switch (status) {
+              case TASK_COMPLETION_STRING:
+              case TASK_BOOLEAN_TRUE:
+              case TASK_BOOLEAN_FALSE:
+                LOG.info("Task executed completely : Status --> " + status);
+                break;
+
+              case TASK_ERROR_STRING:
+                LOG.info("There was error while executing task : Status --> " + status);
+                break;
+
+              default:
+                LOG.warn("Status of task is undefined!! : Status --> " + status);
+            }
+
+            deleteTask(path);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while getting status of task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StatCallback setStatusWatchCallback = new AsyncCallback.StatCallback() {

Review comment:
       nit: use lambda similar to `setStatusOfTaskZNodeCallback`?

##########
File path: hbase-it/src/test/java/org/apache/hadoop/hbase/ChaosZKClient.java
##########
@@ -0,0 +1,338 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase;
+
+import java.io.IOException;
+
+import org.apache.hadoop.hbase.util.Threads;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@InterfaceAudience.Private
+public class ChaosZKClient {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosZKClient.class.getName());
+  private static final String CHAOS_AGENT_PARENT_ZNODE = "/hbase/chaosAgents";
+  private static final String CHAOS_AGENT_STATUS_ZNODE = "/hbase/chaosAgentTaskStatus";
+  private static final String ZNODE_PATH_SEPARATOR = "/";
+  private static final String TASK_PREFIX = "task_";
+  private static final String TASK_ERROR_STRING = "error";
+  private static final String TASK_COMPLETION_STRING = "done";
+  private static final String TASK_BOOLEAN_TRUE = "true";
+  private static final String TASK_BOOLEAN_FALSE = "false";
+  private static final String CONNECTION_LOSS = "ConnectionLoss";
+  private static final int SESSION_TIMEOUT_ZK = 10 * 60 * 1000;
+  private static final int TASK_EXECUTION_TIMEOUT = 5 * 60 * 1000;
+  private volatile String taskStatus = null;
+
+  private String quorum;
+  private ZooKeeper zk;
+
+  public ChaosZKClient(String quorum) {
+    this.quorum = quorum;
+    try {
+      this.createNewZKConnection();
+    } catch (IOException e) {
+      LOG.error("Error creating ZooKeeper Connection: " + e);
+    }
+  }
+
+  /**
+   * Creates connection with ZooKeeper
+   * @throws IOException when not able to create connection properly
+   */
+  public void createNewZKConnection() throws IOException {
+    Watcher watcher = new Watcher() {
+      @Override
+      public void process(WatchedEvent watchedEvent) {
+        LOG.info("Created ZooKeeper Connection For executing task");
+      }
+    };
+
+    this.zk = new ZooKeeper(quorum, SESSION_TIMEOUT_ZK, watcher);
+  }
+
+  /**
+   * Checks if ChaosAgent is running or not on target host by checking its ZNode.
+   * @param hostname hostname to check for chaosagent
+   * @return true/false whether agent is running or not
+   */
+  private boolean isChaosAgentRunning(String hostname) {
+    try {
+      return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+        false) != null;
+    } catch (KeeperException e) {
+      if (e.toString().contains(CONNECTION_LOSS)) {
+        recreateZKConnection();
+        try {
+          return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+            false) != null;
+        } catch (KeeperException  | InterruptedException ie) {
+          LOG.error("ERROR " + ie);
+        }
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error checking for given hostname: " + hostname + " ERROR: " + e);
+    }
+    return false;
+  }
+
+  /**
+   * Creates tasks for target hosts by creating ZNodes.
+   * Waits for a limited amount of time to complete task to execute.
+   * @param taskObject Object data represents command
+   * @return returns status
+   */
+  public String submitTask(final TaskObject taskObject) {
+    if (isChaosAgentRunning(taskObject.getTaskHostname())) {
+      LOG.info("Creating task node");
+      zk.create(CHAOS_AGENT_STATUS_ZNODE + ZNODE_PATH_SEPARATOR +
+          taskObject.getTaskHostname() + ZNODE_PATH_SEPARATOR + TASK_PREFIX,
+        taskObject.getCommand().getBytes(),
+        ZooDefs.Ids.OPEN_ACL_UNSAFE,
+        CreateMode.EPHEMERAL_SEQUENTIAL,
+        submitTaskCallback,
+        taskObject);
+      long start = System.currentTimeMillis();
+
+      while ((System.currentTimeMillis() - start) < TASK_EXECUTION_TIMEOUT) {
+        if(taskStatus != null) {
+          return taskStatus;
+        }
+        Threads.sleep(500);
+      }
+    } else {
+      LOG.info("EHHHHH!  ChaosAgent Not running");
+    }
+    return TASK_ERROR_STRING;
+  }
+
+  /**
+   * To get status of task submitted
+   * @param path path at which to get status
+   * @param ctx path context
+   */
+  private void getStatus(String path , Object ctx) {
+    LOG.info("Getting Status of task: " + path);
+    zk.getData(path,
+      false,
+      getStatusCallback,
+      ctx);
+  }
+
+  /**
+   * Set a watch on task submitted
+   * @param name ZNode name to set a watch
+   * @param taskObject context for ZNode name
+   */
+  private void setStatusWatch(String name, TaskObject taskObject) {
+    LOG.info("Checking for ZNode and Setting watch for task : " + name);
+    zk.exists(name,
+      setStatusWatcher,
+      setStatusWatchCallback,
+      taskObject);
+  }
+
+  /**
+   * Delete task after getting its status
+   * @param path path to delete ZNode
+   */
+  private void deleteTask(String path) {
+    LOG.info("Deleting task: " + path);
+    zk.delete(path,
+      -1,
+      taskDeleteCallback,
+      null);
+  }
+
+  //WATCHERS:
+
+  /**
+   * Watcher to get notification whenever status of task changes.
+   */
+  Watcher setStatusWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      LOG.info("Setting status watch for task: " + watchedEvent.getPath());
+      if(watchedEvent.getType() == Event.EventType.NodeDataChanged) {
+        assert watchedEvent.getPath().contains(TASK_PREFIX);
+        getStatus(watchedEvent.getPath(), (Object) watchedEvent.getPath());
+
+      }
+    }
+  };
+
+  //CALLBACKS
+
+  AsyncCallback.DataCallback getStatusCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while getting status of task, getting again
+          recreateZKConnection();
+          getStatus(path, ctx);
+          break;
+
+        case OK:
+          if (ctx!=null) {
+
+            String status = new String(data);
+            taskStatus = status;
+            switch (status) {
+              case TASK_COMPLETION_STRING:
+              case TASK_BOOLEAN_TRUE:
+              case TASK_BOOLEAN_FALSE:
+                LOG.info("Task executed completely : Status --> " + status);
+                break;
+
+              case TASK_ERROR_STRING:
+                LOG.info("There was error while executing task : Status --> " + status);
+                break;
+
+              default:
+                LOG.warn("Status of task is undefined!! : Status --> " + status);
+            }
+
+            deleteTask(path);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while getting status of task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StatCallback setStatusWatchCallback = new AsyncCallback.StatCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //ConnectionLoss while setting watch on status ZNode, setting again.
+          recreateZKConnection();
+          setStatusWatch(path, (TaskObject) ctx);
+          break;
+
+        case OK:
+          if(stat != null) {
+            getStatus(path, null);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while setting watch on task ZNode: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StringCallback submitTaskCallback = new AsyncCallback.StringCallback() {

Review comment:
       nit: use lambda similar to `setStatusOfTaskZNodeCallback`?

##########
File path: hbase-it/src/test/java/org/apache/hadoop/hbase/ChaosZKClient.java
##########
@@ -0,0 +1,338 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase;
+
+import java.io.IOException;
+
+import org.apache.hadoop.hbase.util.Threads;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@InterfaceAudience.Private
+public class ChaosZKClient {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosZKClient.class.getName());
+  private static final String CHAOS_AGENT_PARENT_ZNODE = "/hbase/chaosAgents";
+  private static final String CHAOS_AGENT_STATUS_ZNODE = "/hbase/chaosAgentTaskStatus";
+  private static final String ZNODE_PATH_SEPARATOR = "/";
+  private static final String TASK_PREFIX = "task_";
+  private static final String TASK_ERROR_STRING = "error";
+  private static final String TASK_COMPLETION_STRING = "done";
+  private static final String TASK_BOOLEAN_TRUE = "true";
+  private static final String TASK_BOOLEAN_FALSE = "false";
+  private static final String CONNECTION_LOSS = "ConnectionLoss";
+  private static final int SESSION_TIMEOUT_ZK = 10 * 60 * 1000;
+  private static final int TASK_EXECUTION_TIMEOUT = 5 * 60 * 1000;
+  private volatile String taskStatus = null;
+
+  private String quorum;
+  private ZooKeeper zk;
+
+  public ChaosZKClient(String quorum) {
+    this.quorum = quorum;
+    try {
+      this.createNewZKConnection();
+    } catch (IOException e) {
+      LOG.error("Error creating ZooKeeper Connection: " + e);
+    }
+  }
+
+  /**
+   * Creates connection with ZooKeeper
+   * @throws IOException when not able to create connection properly
+   */
+  public void createNewZKConnection() throws IOException {
+    Watcher watcher = new Watcher() {
+      @Override
+      public void process(WatchedEvent watchedEvent) {
+        LOG.info("Created ZooKeeper Connection For executing task");
+      }
+    };
+
+    this.zk = new ZooKeeper(quorum, SESSION_TIMEOUT_ZK, watcher);
+  }
+
+  /**
+   * Checks if ChaosAgent is running or not on target host by checking its ZNode.
+   * @param hostname hostname to check for chaosagent
+   * @return true/false whether agent is running or not
+   */
+  private boolean isChaosAgentRunning(String hostname) {
+    try {
+      return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+        false) != null;
+    } catch (KeeperException e) {
+      if (e.toString().contains(CONNECTION_LOSS)) {
+        recreateZKConnection();
+        try {
+          return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+            false) != null;
+        } catch (KeeperException  | InterruptedException ie) {
+          LOG.error("ERROR " + ie);
+        }
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error checking for given hostname: " + hostname + " ERROR: " + e);
+    }
+    return false;
+  }
+
+  /**
+   * Creates tasks for target hosts by creating ZNodes.
+   * Waits for a limited amount of time to complete task to execute.
+   * @param taskObject Object data represents command
+   * @return returns status
+   */
+  public String submitTask(final TaskObject taskObject) {
+    if (isChaosAgentRunning(taskObject.getTaskHostname())) {
+      LOG.info("Creating task node");
+      zk.create(CHAOS_AGENT_STATUS_ZNODE + ZNODE_PATH_SEPARATOR +
+          taskObject.getTaskHostname() + ZNODE_PATH_SEPARATOR + TASK_PREFIX,
+        taskObject.getCommand().getBytes(),
+        ZooDefs.Ids.OPEN_ACL_UNSAFE,
+        CreateMode.EPHEMERAL_SEQUENTIAL,
+        submitTaskCallback,
+        taskObject);
+      long start = System.currentTimeMillis();
+
+      while ((System.currentTimeMillis() - start) < TASK_EXECUTION_TIMEOUT) {
+        if(taskStatus != null) {
+          return taskStatus;
+        }
+        Threads.sleep(500);
+      }
+    } else {
+      LOG.info("EHHHHH!  ChaosAgent Not running");
+    }
+    return TASK_ERROR_STRING;
+  }
+
+  /**
+   * To get status of task submitted
+   * @param path path at which to get status
+   * @param ctx path context
+   */
+  private void getStatus(String path , Object ctx) {
+    LOG.info("Getting Status of task: " + path);
+    zk.getData(path,
+      false,
+      getStatusCallback,
+      ctx);
+  }
+
+  /**
+   * Set a watch on task submitted
+   * @param name ZNode name to set a watch
+   * @param taskObject context for ZNode name
+   */
+  private void setStatusWatch(String name, TaskObject taskObject) {
+    LOG.info("Checking for ZNode and Setting watch for task : " + name);
+    zk.exists(name,
+      setStatusWatcher,
+      setStatusWatchCallback,
+      taskObject);
+  }
+
+  /**
+   * Delete task after getting its status
+   * @param path path to delete ZNode
+   */
+  private void deleteTask(String path) {
+    LOG.info("Deleting task: " + path);
+    zk.delete(path,
+      -1,
+      taskDeleteCallback,
+      null);
+  }
+
+  //WATCHERS:
+
+  /**
+   * Watcher to get notification whenever status of task changes.
+   */
+  Watcher setStatusWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      LOG.info("Setting status watch for task: " + watchedEvent.getPath());
+      if(watchedEvent.getType() == Event.EventType.NodeDataChanged) {
+        assert watchedEvent.getPath().contains(TASK_PREFIX);
+        getStatus(watchedEvent.getPath(), (Object) watchedEvent.getPath());
+
+      }
+    }
+  };
+
+  //CALLBACKS
+
+  AsyncCallback.DataCallback getStatusCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while getting status of task, getting again
+          recreateZKConnection();
+          getStatus(path, ctx);
+          break;
+
+        case OK:
+          if (ctx!=null) {
+
+            String status = new String(data);
+            taskStatus = status;
+            switch (status) {
+              case TASK_COMPLETION_STRING:
+              case TASK_BOOLEAN_TRUE:
+              case TASK_BOOLEAN_FALSE:
+                LOG.info("Task executed completely : Status --> " + status);
+                break;
+
+              case TASK_ERROR_STRING:
+                LOG.info("There was error while executing task : Status --> " + status);
+                break;
+
+              default:
+                LOG.warn("Status of task is undefined!! : Status --> " + status);
+            }
+
+            deleteTask(path);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while getting status of task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StatCallback setStatusWatchCallback = new AsyncCallback.StatCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //ConnectionLoss while setting watch on status ZNode, setting again.
+          recreateZKConnection();
+          setStatusWatch(path, (TaskObject) ctx);
+          break;
+
+        case OK:
+          if(stat != null) {
+            getStatus(path, null);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while setting watch on task ZNode: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StringCallback submitTaskCallback = new AsyncCallback.StringCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, String name) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to server was lost while submitting task, submitting again.
+          recreateZKConnection();
+          submitTask((TaskObject) ctx);
+          break;
+
+        case OK:
+          LOG.info("Task created : " + name);
+          setStatusWatch(name, (TaskObject) ctx);
+          break;
+
+        default:
+          LOG.error("Error submitting task: " + name + " ERROR:" +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.VoidCallback taskDeleteCallback = new AsyncCallback.VoidCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while deleting task, deleting again
+          recreateZKConnection();
+          deleteTask(path);
+          break;
+
+        case OK:
+          LOG.info("Task Deleted successfully!");
+          LOG.info("Closing ZooKeeper Connection");
+          try {
+            zk.close();
+          } catch (InterruptedException e) {
+            LOG.error("Error while closing ZooKeeper Connection.");
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while deleting task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+
+  private void recreateZKConnection() {
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error closing ZK connection : " + e);
+    } finally {
+      try {
+        createNewZKConnection();
+      } catch (IOException e) {
+        LOG.error("Error creating new ZK COnnection for agent: " + e);
+      }
+    }
+  }
+
+  static class TaskObject {
+    private String command;
+    private String taskHostname;

Review comment:
       nit: let's keep both as `final`

##########
File path: hbase-it/src/main/java/org/apache/hadoop/hbase/chaos/ChaosAgent.java
##########
@@ -0,0 +1,596 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase.chaos;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicBoolean;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.util.Pair;
+import org.apache.hadoop.hbase.util.RetryCounter;
+import org.apache.hadoop.hbase.util.RetryCounterFactory;
+import org.apache.hadoop.util.Shell;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/***
+ * An agent for executing destructive actions for ChaosMonkey.
+ * Uses ZooKeeper Watchersc and LocalShell, to do the killing
+ * and getting status of service on targeted host without SSH.
+ * uses given ZNode Structure:
+ *  /perfChaosTest (root)
+ *              |
+ *              |
+ *              /chaosAgents (Used for registration has
+ *              hostname ephemeral nodes as children)
+ *              |
+ *              |
+ *              /chaosAgentTaskStatus (Used for task
+ *              Execution, has hostname persistent
+ *              nodes as child with tasks as their children)
+ *                          |
+ *                          |
+ *                          /hostname
+ *                                |
+ *                                |
+ *                                /task0000001 (command as data)
+ *                                (has two types of command :
+ *                                     1: starts with "exec"
+ *                                       for executing a destructive action.
+ *                                     2: starts with "bool" for getting
+ *                                       only status of service.
+ *
+ */
+@InterfaceAudience.Private
+public class ChaosAgent implements Watcher, Closeable, Runnable {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosAgent.class.getName());
+  static AtomicBoolean stopChaosAgent = new AtomicBoolean();
+  private ZooKeeper zk;
+  private String quorum;
+  private String agentName;
+  private Configuration conf;
+  private RetryCounterFactory retryCounterFactory;
+  private volatile boolean connected = false;
+
+  public ChaosAgent(Configuration conf, String quorum, String agentName) {
+    initChaosAgent(conf, quorum, agentName);
+  }
+
+  /***
+   * sets global params and initiates connection with ZooKeeper then does registration.
+   * @param conf initial configuration to use
+   * @param quorum ZK Quorum
+   * @param agentName AgentName to use
+   */
+  private void initChaosAgent(Configuration conf, String quorum, String agentName) {
+    this.conf = conf;
+    this.quorum = quorum;
+    this.agentName = agentName;
+    this.retryCounterFactory = new RetryCounterFactory(new RetryCounter.RetryConfig()
+      .setMaxAttempts(conf.getInt(ChaosConstants.RETRY_ATTEMPTS_KEY,
+        ChaosConstants.DEFAULT_RETRY_ATTEMPTS)).setSleepInterval(
+          conf.getLong(ChaosConstants.RETRY_SLEEP_INTERVAL_KEY,
+            ChaosConstants.DEFAULT_RETRY_SLEEP_INTERVAL)));
+    try {
+      this.createZKConnection(null);
+      this.register();
+    } catch (IOException e) {
+      LOG.error("Error Creating Connection: " + e);
+    }
+  }
+
+  /***
+   * Creates Connection with ZooKeeper.
+   * @throws IOException if something goes wrong
+   */
+  private void createZKConnection(Watcher watcher) throws IOException {
+    if(watcher == null) {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, this);
+    } else {
+      zk = new ZooKeeper(quorum, ChaosConstants.SESSION_TIMEOUT_ZK, watcher);
+    }
+    LOG.info("ZooKeeper Connection created for ChaosAgent: " + agentName);
+  }
+
+  //WATCHERS: Below are the Watches used by ChaosAgent
+
+  /***
+   * Watcher for notifying if any task is assigned to agent or not,
+   * by seeking if any Node is being added to agent as Child.
+   */
+  Watcher newTaskCreatedWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      if (watchedEvent.getType() == Event.EventType.NodeChildrenChanged) {
+        assert (ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName).equals(watchedEvent.getPath());
+
+        LOG.info("Change in Tasks Node, checking for Tasks again.");
+        getTasks();
+      }
+
+    }
+  };
+
+  //CALLBACKS: Below are the Callbacks used by Chaos Agent
+
+  /**
+   * Callback used while setting status of a given task, Logs given status.
+   */
+  AsyncCallback.StatCallback setStatusOfTaskZNodeCallback = (rc, path, ctx, stat) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        // Connection to the server was lost while setting status setting again.
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        setStatusOfTaskZNode(path, (String) ctx);
+        break;
+
+      case OK:
+        LOG.info("Status of Task has been set");
+        break;
+
+      case NONODE:
+        LOG.error("Chaos Agent status node does not exists: "
+          + "check for ZNode directory structure again.");
+        break;
+
+      default:
+        LOG.error("Error while setting status of task ZNode: " +
+          path, KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Persistent ZNode tries to create
+   * ZNode again if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Persistent ZNode: " + path,
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used while creating a Ephemeral ZNode tries to create ZNode again
+   * if Connection was lost in previous try.
+   */
+  AsyncCallback.StringCallback createEphemeralZNodeCallback = (rc, path, ctx, name) -> {
+    switch (KeeperException.Code.get(rc)) {
+      case CONNECTIONLOSS:
+        try {
+          recreateZKConnection();
+        } catch (Exception e) {
+          break;
+        }
+        createEphemeralZNode(path, (byte[]) ctx);
+        break;
+      case OK:
+        LOG.info("ZNode created : " + path);
+        break;
+      case NODEEXISTS:
+        LOG.warn("ZNode already registered: " + path);
+        break;
+      default:
+        LOG.error("Error occurred while creating Ephemeral ZNode: ",
+          KeeperException.create(KeeperException.Code.get(rc), path));
+    }
+  };
+
+  /**
+   * Callback used by getTasksForAgentCallback while getting command,
+   * after getting command successfully, it executes command and
+   * set its status with respect to the command type.
+   */
+  AsyncCallback.DataCallback getTaskForExecutionCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connection to the server has been lost while getting task, getting data again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          zk.getData(path,
+            false,
+            getTaskForExecutionCallback,
+            new String(data));
+          break;
+        case OK:
+          String cmd = new String(data);
+          LOG.info("Executing command : " + cmd);
+          String status = ChaosConstants.TASK_COMPLETION_STRING;
+          try {
+            String user = conf.get(ChaosConstants.CHAOSAGENT_SHELL_USER,
+              ChaosConstants.DEFAULT_SHELL_USER);
+            switch (cmd.substring(0, 4)) {
+              case "bool":
+                String ret = execWithRetries(user, cmd.substring(4)).getSecond();
+                status = Boolean.toString(ret.length() > 0);
+                break;
+
+              case "exec":
+                execWithRetries(user, cmd.substring(4));
+                break;
+
+              default:
+                LOG.error("Unknown Command Type");
+                status = ChaosConstants.TASK_ERROR_STRING;
+            }
+          } catch (IOException e) {
+            LOG.error("Got error while executing command : " + cmd +
+              " On agent : " + agentName + " Error : " + e);
+            status = ChaosConstants.TASK_ERROR_STRING;
+          }
+
+          try {
+            setStatusOfTaskZNode(path, status);
+            Thread.sleep(ChaosConstants.SET_STATUS_SLEEP_TIME);
+          } catch (InterruptedException e) {
+            LOG.error("Error occured after setting status: " + e);
+          }
+
+        default:
+          LOG.error("Error occurred while getting data",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Callback used while getting Tasks for agent if call executed without Exception,
+   * It creates a separate thread for each children to execute given Tasks parallely.
+   */
+  AsyncCallback.ChildrenCallback getTasksForAgentCallback = new AsyncCallback.ChildrenCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, List<String> children) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to the server has been lost, getting tasks again.
+          try {
+            recreateZKConnection();
+          } catch (Exception e) {
+            break;
+          }
+          getTasks();
+          break;
+
+        case OK:
+          if (children != null) {
+            try {
+
+              LOG.info("Executing each task as a separate thread");
+              List<Thread> tasksList = new ArrayList<>();
+              for (String task : children) {
+                String threadName = agentName + "_" + task;
+                Thread t = new Thread(() -> {
+
+                  LOG.info("Executing task : " + task + " of agent : " + agentName);
+                  zk.getData(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName +
+                      ChaosConstants.ZNODE_PATH_SEPARATOR + task,
+                    false,
+                    getTaskForExecutionCallback,
+                    task);
+
+                });
+                t.setName(threadName);
+                t.start();
+                tasksList.add(t);
+
+                for (Thread thread : tasksList) {
+                  thread.join();
+                }
+              }
+            } catch (InterruptedException e) {
+              LOG.error("Error scheduling next task : " +
+                " for agent : " + agentName + " Error : " + e);
+            }
+          }
+
+        default:
+          LOG.error("Error occurred while getting task",
+            KeeperException.create(KeeperException.Code.get(rc), path));
+      }
+    }
+  };
+
+  /***
+   * Function to create PERSISTENT ZNODE with given path and data given as params
+   * @param path Path at which ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.PERSISTENT,
+      createZNodeCallback,
+      data);
+  }
+
+  /***
+   * Function to create EPHEMERAL ZNODE with given path and data as params.
+   * @param path Path at which Ephemeral ZNode to create
+   * @param data Data to put under ZNode
+   */
+  public void createEphemeralZNode(String path, byte[] data) {
+    zk.create(path,
+      data,
+      ZooDefs.Ids.OPEN_ACL_UNSAFE,
+      CreateMode.EPHEMERAL,
+      createEphemeralZNodeCallback,
+      data);
+  }
+
+  /**
+   * Checks if given ZNode exists, if not creates a PERSISTENT ZNODE for same.
+   *
+   * @param path Path to check for ZNode
+   */
+  private void createIfZNodeNotExists(String path) {
+    try {
+      if (zk.exists(path,
+        false) == null) {
+        createZNode(path, new byte[0]);
+      }
+    } catch (KeeperException | InterruptedException e) {
+      LOG.error("Error checking given node : " + path + " " + e);
+    }
+  }
+
+  /**
+   * sets given Status for Task Znode
+   *
+   * @param taskZNode ZNode to set status
+   * @param status Status value
+   */
+  public void setStatusOfTaskZNode(String taskZNode, String status) {
+    LOG.info("Setting status of Task ZNode: " + taskZNode + " status : " + status);
+    zk.setData(taskZNode,
+      status.getBytes(),
+      -1,
+      setStatusOfTaskZNodeCallback,
+      null);
+  }
+
+  /**
+   * registration of ChaosAgent by checking and creating necessary ZNodes.
+   */
+  private void register() {
+    createIfZNodeNotExists(ChaosConstants.CHAOS_TEST_ROOT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE);
+    createIfZNodeNotExists(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName);
+
+    createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+      ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+  }
+
+  /***
+   * Gets tasks for execution, basically sets Watch on it's respective host's Znode and
+   * waits for tasks to be assigned, also has a getTasksForAgentCallback
+   * which handles execution of task.
+   */
+  private void getTasks() {
+    LOG.info("Getting Tasks for Agent: " + agentName + "and setting watch for new Tasks");
+    zk.getChildren(ChaosConstants.CHAOS_AGENT_STATUS_PERSISTENT_ZNODE +
+        ChaosConstants.ZNODE_PATH_SEPARATOR + agentName,
+      newTaskCreatedWatcher,
+      getTasksForAgentCallback,
+      null);
+  }
+
+  /**
+   * Below function executes command with retries with given user.
+   * Uses LocalShell to execute a command.
+   *
+   * @param user user name, default none
+   * @param cmd Command to execute
+   * @return A pair of Exit Code and Shell output
+   * @throws IOException Exception while executing shell command
+   */
+  private Pair<Integer, String> execWithRetries(String user, String cmd) throws IOException {
+    RetryCounter retryCounter = retryCounterFactory.create();
+    while (true) {
+      try {
+        return exec(user, cmd);
+      } catch (IOException e) {
+        retryOrThrow(retryCounter, e, user, cmd);
+      }
+      try {
+        retryCounter.sleepUntilNextRetry();
+      } catch (InterruptedException e) {
+        LOG.warn("Sleep Interrupted: " + e);
+      }
+    }
+  }
+
+  private Pair<Integer, String> exec(String user, String cmd) throws IOException {
+    LOG.info("Executing Shell command: " + cmd + " , user: " + user);
+
+    LocalShell shell = new LocalShell(user, cmd);
+    try {
+      shell.execute();
+    } catch (Shell.ExitCodeException e) {
+      String output = shell.getOutput();
+      throw new Shell.ExitCodeException(e.getExitCode(), "stderr: " + e.getMessage()
+        + ", stdout: " + output);
+    }
+    LOG.info("Executed Shell command, exit code: " + shell.getExitCode() +
+      " , output:" + shell.getOutput());
+
+    return new Pair<>(shell.getExitCode(), shell.getOutput());
+  }
+
+  private <E extends Exception> void retryOrThrow(RetryCounter retryCounter, E ex,
+    String user, String cmd) throws E {
+    if (retryCounter.shouldRetry()) {
+      LOG.warn("Local command: " + cmd + " , user:" + user
+        + " failed at attempt " + retryCounter.getAttemptTimes() + ". Retrying until maxAttempts: "
+        + retryCounter.getMaxAttempts() + ". Exception: " + ex.getMessage());
+      return;
+    }
+    throw ex;
+  }
+
+  private boolean isConnected() {
+    return connected;
+  }
+
+  @Override
+  public void close() throws IOException {
+    LOG.info("Closing ZooKeeper Connection for Chaos Agent : " + agentName);
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error while closing ZooKeeper Connection.");
+    }
+  }
+
+  @Override
+  public void run() {
+    try {
+      LOG.info("Running Chaos Agent on : " + agentName);
+      while (!this.isConnected()) {
+        Thread.sleep(100);
+      }
+      this.getTasks();
+      while (!stopChaosAgent.get()) {
+        Thread.sleep(500);
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error while running Chaos Agent");
+    }
+
+  }
+
+  @Override
+  public void process(WatchedEvent watchedEvent) {
+    LOG.info("Processing event: " + watchedEvent.toString());
+    if (watchedEvent.getType() == Event.EventType.None) {
+      switch (watchedEvent.getState()) {
+        case SyncConnected:
+          connected = true;
+          break;
+        case Disconnected:
+          connected = false;
+          break;
+        case Expired:
+          connected = false;
+          LOG.error("Session expired creating again");
+          try {
+            createZKConnection(null);
+          } catch (IOException e) {
+            LOG.error("Error creating Zookeeper connection");
+          }
+        default:
+          LOG.error("Unknown State");
+          break;
+      }
+    }
+  }
+
+  private void recreateZKConnection() throws Exception{
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error closing ZK connection : " + e);
+      throw new RuntimeException(e) ;
+    } finally {
+      try {
+        createZKConnection(newTaskCreatedWatcher);
+        createEphemeralZNode(ChaosConstants.CHAOS_AGENT_REGISTRATION_EPIMERAL_ZNODE +
+          ChaosConstants.ZNODE_PATH_SEPARATOR + agentName, new byte[0]);
+      } catch (IOException e) {
+        LOG.error("Error creating new ZK COnnection for agent: " + agentName + e);
+        throw new RuntimeException(e);
+      }
+    }
+  }
+
+  /**
+   * Executes Command locally.
+   */
+  protected class LocalShell extends Shell.ShellCommandExecutor {
+
+    private String user;
+    private String execCommand;
+
+    public LocalShell(String user, String execCommand) {
+      super(new String[]{execCommand});
+      this.user = user;
+      this.execCommand = execCommand;
+    }
+
+    public LocalShell(String[] execString, File dir, Map<String, String> env, long timeout) {

Review comment:
       Is this being used?

##########
File path: hbase-it/src/test/java/org/apache/hadoop/hbase/ChaosZKClient.java
##########
@@ -0,0 +1,338 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hbase;
+
+import java.io.IOException;
+
+import org.apache.hadoop.hbase.util.Threads;
+import org.apache.yetus.audience.InterfaceAudience;
+import org.apache.zookeeper.AsyncCallback;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.ZooKeeper;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+@InterfaceAudience.Private
+public class ChaosZKClient {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ChaosZKClient.class.getName());
+  private static final String CHAOS_AGENT_PARENT_ZNODE = "/hbase/chaosAgents";
+  private static final String CHAOS_AGENT_STATUS_ZNODE = "/hbase/chaosAgentTaskStatus";
+  private static final String ZNODE_PATH_SEPARATOR = "/";
+  private static final String TASK_PREFIX = "task_";
+  private static final String TASK_ERROR_STRING = "error";
+  private static final String TASK_COMPLETION_STRING = "done";
+  private static final String TASK_BOOLEAN_TRUE = "true";
+  private static final String TASK_BOOLEAN_FALSE = "false";
+  private static final String CONNECTION_LOSS = "ConnectionLoss";
+  private static final int SESSION_TIMEOUT_ZK = 10 * 60 * 1000;
+  private static final int TASK_EXECUTION_TIMEOUT = 5 * 60 * 1000;
+  private volatile String taskStatus = null;
+
+  private String quorum;
+  private ZooKeeper zk;
+
+  public ChaosZKClient(String quorum) {
+    this.quorum = quorum;
+    try {
+      this.createNewZKConnection();
+    } catch (IOException e) {
+      LOG.error("Error creating ZooKeeper Connection: " + e);
+    }
+  }
+
+  /**
+   * Creates connection with ZooKeeper
+   * @throws IOException when not able to create connection properly
+   */
+  public void createNewZKConnection() throws IOException {
+    Watcher watcher = new Watcher() {
+      @Override
+      public void process(WatchedEvent watchedEvent) {
+        LOG.info("Created ZooKeeper Connection For executing task");
+      }
+    };
+
+    this.zk = new ZooKeeper(quorum, SESSION_TIMEOUT_ZK, watcher);
+  }
+
+  /**
+   * Checks if ChaosAgent is running or not on target host by checking its ZNode.
+   * @param hostname hostname to check for chaosagent
+   * @return true/false whether agent is running or not
+   */
+  private boolean isChaosAgentRunning(String hostname) {
+    try {
+      return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+        false) != null;
+    } catch (KeeperException e) {
+      if (e.toString().contains(CONNECTION_LOSS)) {
+        recreateZKConnection();
+        try {
+          return zk.exists(CHAOS_AGENT_PARENT_ZNODE + ZNODE_PATH_SEPARATOR + hostname,
+            false) != null;
+        } catch (KeeperException  | InterruptedException ie) {
+          LOG.error("ERROR " + ie);
+        }
+      }
+    } catch (InterruptedException e) {
+      LOG.error("Error checking for given hostname: " + hostname + " ERROR: " + e);
+    }
+    return false;
+  }
+
+  /**
+   * Creates tasks for target hosts by creating ZNodes.
+   * Waits for a limited amount of time to complete task to execute.
+   * @param taskObject Object data represents command
+   * @return returns status
+   */
+  public String submitTask(final TaskObject taskObject) {
+    if (isChaosAgentRunning(taskObject.getTaskHostname())) {
+      LOG.info("Creating task node");
+      zk.create(CHAOS_AGENT_STATUS_ZNODE + ZNODE_PATH_SEPARATOR +
+          taskObject.getTaskHostname() + ZNODE_PATH_SEPARATOR + TASK_PREFIX,
+        taskObject.getCommand().getBytes(),
+        ZooDefs.Ids.OPEN_ACL_UNSAFE,
+        CreateMode.EPHEMERAL_SEQUENTIAL,
+        submitTaskCallback,
+        taskObject);
+      long start = System.currentTimeMillis();
+
+      while ((System.currentTimeMillis() - start) < TASK_EXECUTION_TIMEOUT) {
+        if(taskStatus != null) {
+          return taskStatus;
+        }
+        Threads.sleep(500);
+      }
+    } else {
+      LOG.info("EHHHHH!  ChaosAgent Not running");
+    }
+    return TASK_ERROR_STRING;
+  }
+
+  /**
+   * To get status of task submitted
+   * @param path path at which to get status
+   * @param ctx path context
+   */
+  private void getStatus(String path , Object ctx) {
+    LOG.info("Getting Status of task: " + path);
+    zk.getData(path,
+      false,
+      getStatusCallback,
+      ctx);
+  }
+
+  /**
+   * Set a watch on task submitted
+   * @param name ZNode name to set a watch
+   * @param taskObject context for ZNode name
+   */
+  private void setStatusWatch(String name, TaskObject taskObject) {
+    LOG.info("Checking for ZNode and Setting watch for task : " + name);
+    zk.exists(name,
+      setStatusWatcher,
+      setStatusWatchCallback,
+      taskObject);
+  }
+
+  /**
+   * Delete task after getting its status
+   * @param path path to delete ZNode
+   */
+  private void deleteTask(String path) {
+    LOG.info("Deleting task: " + path);
+    zk.delete(path,
+      -1,
+      taskDeleteCallback,
+      null);
+  }
+
+  //WATCHERS:
+
+  /**
+   * Watcher to get notification whenever status of task changes.
+   */
+  Watcher setStatusWatcher = new Watcher() {
+    @Override
+    public void process(WatchedEvent watchedEvent) {
+      LOG.info("Setting status watch for task: " + watchedEvent.getPath());
+      if(watchedEvent.getType() == Event.EventType.NodeDataChanged) {
+        assert watchedEvent.getPath().contains(TASK_PREFIX);
+        getStatus(watchedEvent.getPath(), (Object) watchedEvent.getPath());
+
+      }
+    }
+  };
+
+  //CALLBACKS
+
+  AsyncCallback.DataCallback getStatusCallback = new AsyncCallback.DataCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while getting status of task, getting again
+          recreateZKConnection();
+          getStatus(path, ctx);
+          break;
+
+        case OK:
+          if (ctx!=null) {
+
+            String status = new String(data);
+            taskStatus = status;
+            switch (status) {
+              case TASK_COMPLETION_STRING:
+              case TASK_BOOLEAN_TRUE:
+              case TASK_BOOLEAN_FALSE:
+                LOG.info("Task executed completely : Status --> " + status);
+                break;
+
+              case TASK_ERROR_STRING:
+                LOG.info("There was error while executing task : Status --> " + status);
+                break;
+
+              default:
+                LOG.warn("Status of task is undefined!! : Status --> " + status);
+            }
+
+            deleteTask(path);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while getting status of task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StatCallback setStatusWatchCallback = new AsyncCallback.StatCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, Stat stat) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //ConnectionLoss while setting watch on status ZNode, setting again.
+          recreateZKConnection();
+          setStatusWatch(path, (TaskObject) ctx);
+          break;
+
+        case OK:
+          if(stat != null) {
+            getStatus(path, null);
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while setting watch on task ZNode: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.StringCallback submitTaskCallback = new AsyncCallback.StringCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx, String name) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          // Connection to server was lost while submitting task, submitting again.
+          recreateZKConnection();
+          submitTask((TaskObject) ctx);
+          break;
+
+        case OK:
+          LOG.info("Task created : " + name);
+          setStatusWatch(name, (TaskObject) ctx);
+          break;
+
+        default:
+          LOG.error("Error submitting task: " + name + " ERROR:" +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+  AsyncCallback.VoidCallback taskDeleteCallback = new AsyncCallback.VoidCallback() {
+    @Override
+    public void processResult(int rc, String path, Object ctx) {
+      switch (KeeperException.Code.get(rc)) {
+        case CONNECTIONLOSS:
+          //Connectionloss while deleting task, deleting again
+          recreateZKConnection();
+          deleteTask(path);
+          break;
+
+        case OK:
+          LOG.info("Task Deleted successfully!");
+          LOG.info("Closing ZooKeeper Connection");
+          try {
+            zk.close();
+          } catch (InterruptedException e) {
+            LOG.error("Error while closing ZooKeeper Connection.");
+          }
+          break;
+
+        default:
+          LOG.error("ERROR while deleting task: " + path + " ERROR: " +
+            KeeperException.create(KeeperException.Code.get(rc)));
+      }
+    }
+  };
+
+
+  private void recreateZKConnection() {
+    try {
+      zk.close();
+    } catch (InterruptedException e) {
+      LOG.error("Error closing ZK connection : " + e);
+    } finally {
+      try {
+        createNewZKConnection();
+      } catch (IOException e) {
+        LOG.error("Error creating new ZK COnnection for agent: " + e);

Review comment:
       same as others: log should have Exception as arg




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679356481


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 32s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 17s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 19s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 55s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 12s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 16s |  root in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for patch  |
   | -1 :x: |  mvninstall  |   3m 54s |  root in the patch failed.  |
   | +1 :green_heart: |  compile  |   2m 51s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 51s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 50s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 17s |  hbase-it in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 14s |  root in the patch failed.  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 156m 19s |  root in the patch failed.  |
   |  |   | 186m 38s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.12 Server=19.03.12 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 8c8f7a889061 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 6ad73b9668 |
   | Default Java | 2020-01-14 |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-root.txt |
   | mvninstall | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-jdk11-hadoop3-check/output/patch-mvninstall-root.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-it.txt |
   | javadoc | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-root.txt |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/artifact/yetus-jdk11-hadoop3-check/output/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/testReport/ |
   | Max. process+thread count | 3709 (vs. ulimit of 12500) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-748495765


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 48s |  master passed  |
   | +1 :green_heart: |  compile  |   2m 23s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 47s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 20s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 32s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 25s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 25s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 38s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   2m 22s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 332m 41s |  root in the patch failed.  |
   |  |   | 366m 26s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux b65e3e284a73 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f600856a3b |
   | Default Java | AdoptOpenJDK-1.8.0_232-b09 |
   | unit | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-root.txt |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/testReport/ |
   | Max. process+thread count | 6055 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/9/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] mnpoonia commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
mnpoonia commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-679822088


   @lokiore - Tests failures doesn't look related but still good to confirm if these tests fail in local as well without this patch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-743925336


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 42s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 34s |  master passed  |
   | +1 :green_heart: |  compile  |   3m  5s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   7m  3s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   3m 12s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 17s |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 57s |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 57s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 49s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   3m 12s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 177m 20s |  root in the patch passed.  |
   |  |   | 217m 51s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/2299 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 8708c574220d 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / d50816fe44 |
   | Default Java | AdoptOpenJDK-11.0.6+10 |
   |  Test Results | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/testReport/ |
   | Max. process+thread count | 6621 (vs. ulimit of 30000) |
   | modules | C: hbase-it . U: . |
   | Console output | https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-2299/7/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on pull request #2299: HBASE-24620 : Add a ClusterManager which submits command to ZooKeeper and its Agent which picks and execute those Commands.

Posted by GitBox <gi...@apache.org>.
virajjasani commented on pull request #2299:
URL: https://github.com/apache/hbase/pull/2299#issuecomment-695803576


   Let me re-build.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org