You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/01/06 18:13:48 UTC

[GitHub] [accumulo-testing] milleruntime opened a new pull request #184: Fix broken agitator scripts

milleruntime opened a new pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184


   * Create bash functions to start agitators
   * Drop broken perl scripts in favor of bash functions
   * Add function to call ClusterConfigParser to read cluster.yaml
   * Make agitators only log to one file


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783176112



##########
File path: bin/agitator
##########
@@ -40,60 +115,84 @@ function start_agitator() {
   else
     echo >&2 "The agitator requires pssh/parallel-ssh to be installed. Aborting."; exit 1;
   fi
+  ## read configuration into env variables
+  read_cluster_conf
 
   mkdir -p "${at_home}/logs"
   log_base="${at_home}/logs/$(date +%Y%m%d%H%M%S)_$(hostname)"
-  libexec="${at_home}/libexec"
-  master_log="${log_base}_master-agitator"
-  tserver_log="${log_base}_tserver-agitator"
-  datanode_log="${log_base}_datanode-agitator"
-  master_cmd="nohup ${libexec}/master-agitator.pl $AGTR_MASTER_KILL_SLEEP_TIME $AGTR_MASTER_RESTART_SLEEP_TIME"
-  tserver_cmd="nohup ${libexec}/tserver-agitator.pl $AGTR_TSERVER_KILL_SLEEP_TIME $AGTR_TSERVER_RESTART_SLEEP_TIME $AGTR_TSERVER_MIN_KILL $AGTR_TSERVER_MAX_KILL"
-  datanode_cmd="nohup ${libexec}/datanode-agitator.pl $AGTR_DATANODE_KILL_SLEEP_TIME $AGTR_DATANODE_RESTART_SLEEP_TIME $HADOOP_HOME $AGTR_DATANODE_MIN_KILL $AGTR_DATANODE_MAX_KILL"
+  manager_log="${log_base}_manager-agitator.log"
+  tserver_log="${log_base}_tserver-agitator.log"
+  datanode_log="${log_base}_datanode-agitator.log"
+  datanode_kill_cmd="pkill -9 -f '[p]roc_datanode'"
+  datanode_start_cmd="$HADOOP_HOME/bin/hdfs --daemon start datanode"
+  manager_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main manager'"
+  manager_start_cmd="$ACCUMULO_HOME/bin/accumulo-service manager start"
+  tserver_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main tserver'"
+  tserver_start_cmd="$ACCUMULO_HOME/bin/accumulo-service tserver start"
+
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
 
-  if [[ $AGITATOR_USER == "$AGTR_ACCUMULO_USER" ]]; then
-    echo "Running master-agitator and tserver-agitator as $AGITATOR_USER"
-    $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
-  else
-    echo "Running master-agitator and tserver-agitator as $AGTR_ACCUMULO_USER using sudo."
-    sudo --preserve-env=PSSH -u "$AGTR_ACCUMULO_USER" $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    sudo -u "$AGTR_ACCUMULO_USER" $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
+  if [[ $AGITATOR_USER != "$AGTR_ACCUMULO_USER" ]]; then
+    sudo -i -u "$AGTR_ACCUMULO_USER"
   fi
-  if [[ $AGITATOR_USER == "$AGTR_HDFS_USER" ]]; then
-    echo "Running datanode-agitator as $AGITATOR_USER"
-    $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
-  else
-    echo "Running datanode-agitator as $AGTR_HDFS_USER using sudo."
-    sudo -u "$AGTR_HDFS_USER" $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
+  echo "Starting manager and tserver agitation as $(whoami)"
+  start_app_agitator manager "$AGTR_MASTER_KILL_SLEEP_TIME" "$AGTR_MASTER_RESTART_SLEEP_TIME" 1 1 "$manager_start_cmd" "$manager_kill_cmd" > "$manager_log" 2>&1 &

Review comment:
       Are these variables, with "master" in them, `"$AGTR_MASTER_KILL_SLEEP_TIME"` and `"$AGTR_MASTER_RESTART_SLEEP_TIME"` still valid?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] ctubbsii commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1007794423


   > Here is some documentation on the nameref option. I had never used it before, pretty cool.
   
   I'm not familiar with the nameref flag. It looks a bit like setting pointers, which seems... harder to grok and more prone to error. Also, bash docs say you can't use it with arrays. I haven't had a chance to look at your updates yet, to see how you're using it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783198470



##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then

Review comment:
       Good idea.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780485669



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       Thanks for the tips. I already refactored the script to only use one function for all the agitators (except the hdfs one which still uses perl, that can be a follow on). I found a way to pass a variable nameref to a funtion to set the array for the hosts. Let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r782494051



##########
File path: bin/agitator
##########
@@ -30,6 +30,80 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation $kill_sleep_time $restart_sleep_time $min_kill $max_kill for ${hosts_array[*]}"
+  while true; do
+    echo "$T Sleeping for $kill_sleep_time minutes"
+    sleep $((kill_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    if ((max_kill == 1)) ; then
+      node_to_kill=${hosts_array[0]}
+      echo "$T Killing $app_name at $node_to_kill"
+      ssh "$node_to_kill" "$kill_cmd"
+    else
+      local num_to_kill=$((min_kill + RANDOM % max_kill))
+      # get the random nodes to kill
+      local count=0
+      while [[ $count -lt $num_to_kill ]]; do
+        randomHostIndex=$((1 + RANDOM % num_hosts))
+        node_to_kill=${hosts_array[randomHostIndex]}
+        # only add host to the array if its not already there
+        if [[ ! " ${nodes_to_kill_array[*]} " =~ $node_to_kill ]]; then
+          nodes_to_kill_array[count]=$node_to_kill
+        fi
+        count=${#nodes_to_kill_array[@]}
+      done
+      echo "$T Killing $count $app_name nodes"
+      for i in "${nodes_to_kill_array[@]}"; do
+        ssh "$i" "$kill_cmd"
+      done
+    fi
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    echo "$T Sleeping for $restart_sleep_time minutes."
+    sleep $((restart_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    if ((max_kill == 1)) ; then
+      echo "$T Restarting $app_name at $node_to_kill"
+      ssh "$node_to_kill" "bash -c '${ENV_VARS} $start_cmd'"
+    else
+      for i in "${nodes_to_kill_array[@]}"; do
+        echo "$T Restarting $app_name node at ${i}"
+        ssh "$i" "bash -c '${ENV_VARS} $start_cmd'"
+      done
+    fi
+  done
+}
+
 function start_agitator() {
   ## check that pssh is installed, falling back to parallel-ssh if needed
   ## make sure to export it, so it can be seen inside the agitator perl script

Review comment:
       Might need to remove/edit this comment to remove reference to the perl script.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783131598



##########
File path: bin/agitator
##########
@@ -30,6 +30,80 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation $kill_sleep_time $restart_sleep_time $min_kill $max_kill for ${hosts_array[*]}"
+  while true; do
+    echo "$T Sleeping for $kill_sleep_time minutes"
+    sleep $((kill_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    if ((max_kill == 1)) ; then
+      node_to_kill=${hosts_array[0]}
+      echo "$T Killing $app_name at $node_to_kill"
+      ssh "$node_to_kill" "$kill_cmd"
+    else
+      local num_to_kill=$((min_kill + RANDOM % max_kill))
+      # get the random nodes to kill
+      local count=0
+      while [[ $count -lt $num_to_kill ]]; do
+        randomHostIndex=$((1 + RANDOM % num_hosts))
+        node_to_kill=${hosts_array[randomHostIndex]}
+        # only add host to the array if its not already there
+        if [[ ! " ${nodes_to_kill_array[*]} " =~ $node_to_kill ]]; then
+          nodes_to_kill_array[count]=$node_to_kill
+        fi
+        count=${#nodes_to_kill_array[@]}
+      done
+      echo "$T Killing $count $app_name nodes"
+      for i in "${nodes_to_kill_array[@]}"; do
+        ssh "$i" "$kill_cmd"
+      done
+    fi
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    echo "$T Sleeping for $restart_sleep_time minutes."
+    sleep $((restart_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    if ((max_kill == 1)) ; then
+      echo "$T Restarting $app_name at $node_to_kill"
+      ssh "$node_to_kill" "bash -c '${ENV_VARS} $start_cmd'"
+    else
+      for i in "${nodes_to_kill_array[@]}"; do
+        echo "$T Restarting $app_name node at ${i}"
+        ssh "$i" "bash -c '${ENV_VARS} $start_cmd'"
+      done
+    fi
+  done
+}
+
 function start_agitator() {
   ## check that pssh is installed, falling back to parallel-ssh if needed
   ## make sure to export it, so it can be seen inside the agitator perl script

Review comment:
       Whoops, I missed that one :+1: 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1008950421


   > > Here is some documentation on the nameref option. I had never used it before, pretty cool.
   > 
   > I'm not familiar with the nameref flag. It looks a bit like setting pointers, which seems... harder to grok and more prone to error. Also, bash docs say you can't use it with arrays. I haven't had a chance to look at your updates yet, to see how you're using it.
   
   I didn't see that in the bash docs but with bash there are always more ways to do something wrong than right. If `nameref` shouldn't be used with arrays, then that is interesting because I originally found it in a Stack Overflow post about using it with arrays. https://stackoverflow.com/questions/10582763/how-to-return-an-array-in-bash-without-using-globals 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r782497118



##########
File path: bin/agitator
##########
@@ -30,6 +30,80 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation $kill_sleep_time $restart_sleep_time $min_kill $max_kill for ${hosts_array[*]}"

Review comment:
       It might be helpful to make this more descriptive. Its hard to tell what the output means without looking at the script. Maybe something to describe what each of the 4 numbers in the message refer to.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783166998



##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation. Kill every $kill_sleep_time minutes, restart every $restart_sleep_time minutes."
+  echo "$T Will randomly kill $min_kill to $max_kill from the following: ${hosts_array[*]}"

Review comment:
       Looks good. Minor change that made it a bit more clear to me but either way looks good.
   ```suggestion
     echo "$T Will randomly kill between $min_kill and $max_kill of the following: ${hosts_array[*]}"
   ```
   An example of this reads: `Will randomly kill between 1 and 1 of the following: localhost`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783166998



##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation. Kill every $kill_sleep_time minutes, restart every $restart_sleep_time minutes."
+  echo "$T Will randomly kill $min_kill to $max_kill from the following: ${hosts_array[*]}"

Review comment:
       Looks good. Minor suggestion that made it a bit more clear to me but either way looks good.
   ```suggestion
     echo "$T Will randomly kill between $min_kill and $max_kill of the following: ${hosts_array[*]}"
   ```
   An example of this reads: `Will randomly kill between 1 and 1 of the following: localhost`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780257738



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       The `$APP_HOSTS` variable contains a string of values separated by a space so I was concerned they would expand as separated params if I passed them as a variable. I will test this out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r781255687



##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array
+  get_app_hosts "$app_name" hosts_array
+  local num_hosts=${#hosts_array[@]}
+  local node_to_wack
+  declare -a nodes_to_wack_array

Review comment:
       I think I like `variable=()` better as well. And good catch with the misspelling. I was propagating what was there previously  but there is no reason to keep misspelling it haha.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780257738



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       The `$APP_HOSTS` variable contains a string of values separated by a space so I was concerned they would expain as separated param if I passed them as a variable. I will test this out.

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       The `$APP_HOSTS` variable contains a string of values separated by a space so I was concerned they would expand as separated param if I passed them as a variable. I will test this out.

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       The `$APP_HOSTS` variable contains a string of values separated by a space so I was concerned they would expand as separated params if I passed them as a variable. I will test this out.

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5

Review comment:
       What are the benefits of using shift 6 times? Personally I think this is a lot less user friendly then using the numbers for the parameters. If someone comes and looks at the script to figure out the params they have to do counting and also the shifting makes the ordering of setting the variables important.

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5

Review comment:
       I see why you are shifting; so you can set the last one as an array. I think I am going to create another function to set APP_HOSTS so this is more readable.

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       Thanks for the tips. I already refactored the script to only use one function for all the agitators (except the hdfs one which still uses perl, that can be a follow on). I found a way to pass a variable nameref to a funtion to set the array for the hosts. Let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1011324701


   @DomGarguilo I noticed a quirk while testing that I think could be a follow on task. For the stop agitator, I made the `pkill` kill the sleep first and then anything matching `agitator`. This works but I noticed that the script may run before being killed and introduces a race condition. Here is the log from the tserver agitator:
   <pre>
   20220112 12:37:19 Starting tserver agitation. Kill every 20 minutes, restart every 10 minutes.
   20220112 12:37:19 Will randomly kill between 1 and 1 of the following: localhost
   20220112 12:37:19 Sleeping for 20 minutes
   Terminated
   20220112 12:49:54 Killing tserver at localhost
   </pre>
   
   There should be a way to kill the agitator process and all its child processes (i.e. the sleep function) with one command.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1011331502


   > There should be a way to kill the agitator process and all its child processes (i.e. the sleep function) with one command.
   
   Good idea. I can start looking into that if you would like. @milleruntime 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r781272312



##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array
+  get_app_hosts "$app_name" hosts_array

Review comment:
       This is neat. It also seems less obscure or hazardous then using `nameref`. I will use this technique.
   
   I assume the space between the 2 `<` symbols was intentional? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] ctubbsii commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780462956



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       They will, but that's okay in the example I gave, because I'm basically passing them as varargs. You can either pass them as a single, quoted value with space separated items, and then put that into an array later:
   
   ```bash
   function printHosts() {
     local myarray=($1) # make the single argument turn into an array by not quoting so the spaces create separate items
     local x; for x in "${myarray[@]}"; do echo "$x"; done
   }
   myhosts="host1 host2 host3"
   printHosts "$myhosts" # quotes pass it as single arg
   ```
   
   Or, you can break them out like varargs (which is what I did above and in both the following examples):
   
   Varargs example using an unquoted space-delimited scalar:
   
   ```bash
   function printHosts() {
     local myarray=("$@") # put args into a new named array
     local x; for x in "${myarray[@]}"; do echo "$x"; done
   }
   myhosts="host1 host2 host3"
   printHosts $myhosts # no quotes, so pass it as 3 args
   ```
   
   Varargs example using an initial array where each element in the original array is passed as a separate argument:
   
   ```bash
   function printHosts() {
     local myarray=("$@") # put args into a new named array
     local x; for x in "${myarray[@]}"; do echo "$x"; done
   }
   myhosts=("host1" "host2" "host3") # already an array
   printHosts "${myhosts[@]}" # pass each element of the array as a separate arg
   ```
   
   (Note: all of these examples are simplified, to avoid the `shift` that picks off the arguments before the varargs hosts.)

##########
File path: bin/agitator
##########
@@ -40,57 +115,85 @@ function start_agitator() {
   else
     echo >&2 "The agitator requires pssh/parallel-ssh to be installed. Aborting."; exit 1;
   fi
+  ## read configuration into env variables
+  read_cluster_conf
 
   mkdir -p "${at_home}/logs"
   log_base="${at_home}/logs/$(date +%Y%m%d%H%M%S)_$(hostname)"
-  libexec="${at_home}/libexec"
-  master_log="${log_base}_master-agitator"
-  tserver_log="${log_base}_tserver-agitator"
-  datanode_log="${log_base}_datanode-agitator"
-  master_cmd="nohup ${libexec}/master-agitator.pl $AGTR_MASTER_KILL_SLEEP_TIME $AGTR_MASTER_RESTART_SLEEP_TIME"
-  tserver_cmd="nohup ${libexec}/tserver-agitator.pl $AGTR_TSERVER_KILL_SLEEP_TIME $AGTR_TSERVER_RESTART_SLEEP_TIME $AGTR_TSERVER_MIN_KILL $AGTR_TSERVER_MAX_KILL"
-  datanode_cmd="nohup ${libexec}/datanode-agitator.pl $AGTR_DATANODE_KILL_SLEEP_TIME $AGTR_DATANODE_RESTART_SLEEP_TIME $HADOOP_HOME $AGTR_DATANODE_MIN_KILL $AGTR_DATANODE_MAX_KILL"
+  manager_log="${log_base}_manager-agitator.log"
+  tserver_log="${log_base}_tserver-agitator.log"
+  datanode_log="${log_base}_datanode-agitator.log"
+  datanode_kill_cmd="pkill -9 -f '[p]roc_datanode'"
+  datanode_start_cmd="$HADOOP_HOME/bin/hdfs --daemon start datanode"
+  manager_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main manager'"
+  manager_start_cmd="$ACCUMULO_HOME/bin/accumulo-service manager start"
+  tserver_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main tserver'"
+  tserver_start_cmd="$ACCUMULO_HOME/bin/accumulo-service tserver start"
+
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
 
-  if [[ $AGITATOR_USER == "$AGTR_ACCUMULO_USER" ]]; then
-    echo "Running master-agitator and tserver-agitator as $AGITATOR_USER"
-    $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
-  else
-    echo "Running master-agitator and tserver-agitator as $AGTR_ACCUMULO_USER using sudo."
-    sudo --preserve-env=PSSH -u "$AGTR_ACCUMULO_USER" $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    sudo -u "$AGTR_ACCUMULO_USER" $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
+  if [[ $AGITATOR_USER != "$AGTR_ACCUMULO_USER" ]]; then
+    sudo -i -u "$AGTR_ACCUMULO_USER"
   fi
-  if [[ $AGITATOR_USER == "$AGTR_HDFS_USER" ]]; then
-    echo "Running datanode-agitator as $AGITATOR_USER"
-    $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
-  else
-    echo "Running datanode-agitator as $AGTR_HDFS_USER using sudo."
-    sudo -u "$AGTR_HDFS_USER" $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
+  echo "Starting manager and tserver agitation as $(whoami)"
+  start_app_agitator manager "$AGTR_MASTER_KILL_SLEEP_TIME" "$AGTR_MASTER_RESTART_SLEEP_TIME" 1 1 "$manager_start_cmd" "$manager_kill_cmd" > "$manager_log" 2>&1 &
+  start_app_agitator tserver "$AGTR_TSERVER_KILL_SLEEP_TIME" "$AGTR_TSERVER_RESTART_SLEEP_TIME" "$AGTR_TSERVER_MIN_KILL" "$AGTR_TSERVER_MAX_KILL" "$tserver_start_cmd" "$tserver_kill_cmd" > "$tserver_log" 2>&1 &
+
+  if [[ $AGITATOR_USER != "$AGTR_HDFS_USER" ]]; then
+    sudo -i -u "$AGTR_HDFS_USER"
   fi
+  echo "Running datanode agitator as $(whoami)"
+  start_app_agitator datanode "$AGTR_DATANODE_KILL_SLEEP_TIME" "$AGTR_DATANODE_RESTART_SLEEP_TIME" "$AGTR_DATANODE_MIN_KILL" "$AGTR_DATANODE_MAX_KILL" "$datanode_start_cmd" "$datanode_kill_cmd" > "${datanode_log}" 2>&1 &
 
   if ${AGTR_HDFS:-false} ; then
     agitator_log=${log_base}_hdfs-agitator
-    sudo -u "$AGTR_HDFS_SUPERUSER" nohup "${libexec}/hdfs-agitator.pl" --sleep "${AGTR_HDFS_SLEEP_TIME}" --hdfs-cmd "${AGTR_HDFS_COMMAND}" --superuser "${AGTR_HDFS_SUPERUSER}" >"${agitator_log}.out" 2>"${agitator_log}.err" &
+    sudo -u "$AGTR_HDFS_SUPERUSER" nohup "${at_home}/libexec/hdfs-agitator.pl" --sleep "${AGTR_HDFS_SLEEP_TIME}" --hdfs-cmd "${AGTR_HDFS_COMMAND}" --superuser "${AGTR_HDFS_SUPERUSER}" >"${agitator_log}.out" 2>"${agitator_log}.err" &
   fi
 }
 
 function stop_agitator() {
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
-  if [[ $AGITATOR_USER == "$AGTR_ACCUMULO_USER" ]];  then
-    echo "Stopping all processes matching 'datanode-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f datanode-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'hdfs-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f hdfs-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'agitator.pl' as $AGITATOR_USER"
-    pkill -f agitator.pl 2>/dev/null 2>/dev/null
-  else
-    echo "Stopping all processes matching 'datanode-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f datanode-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'hdfs-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f hdfs-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'agitator.pl' as $AGTR_ACCUMULO_USER"
-    sudo -u "$AGTR_ACCUMULO_USER" pkill -f agitator.pl 2>/dev/null
+  echo "Stopping all processes matching 'sleep' as $AGITATOR_USER"
+  pkill -f sleep 2>/dev/null
+  echo "Stopping all processes matching 'agitator' as $AGITATOR_USER"
+  pkill -f agitator 2>/dev/null
+}
+
+function parse_fail() {
+  echo "Failed to parse ${conf}/cluster.yaml"
+  exit 1
+}
+
+# Read the configuration from cluster.yaml - expects at least the manager and tservers
+function read_cluster_conf() {
+  conf="${ACCUMULO_HOME}/conf"
+  echo "Reading cluster config from ${conf}/cluster.yaml"
+  trap 'rm -f "$CONFIG_FILE"' EXIT
+  CONFIG_FILE=$(mktemp) || exit 1
+  accumulo org.apache.accumulo.core.conf.cluster.ClusterConfigParser "${conf}"/cluster.yaml > "$CONFIG_FILE" || parse_fail
+  . "$CONFIG_FILE"
+  rm -f "$CONFIG_FILE"
+
+  if [[ -z $MANAGER_HOSTS ]]; then
+    echo "ERROR: managers not found in ${conf}/cluster.yaml"
+    exit 1
+  fi
+  if [[ -z $TSERVER_HOSTS ]]; then
+    echo "ERROR: tservers not found in ${conf}/cluster.yaml"
+    exit 1
+  fi
+}
+
+# Given an app_name $1, set the array of hosts to the named reference passed in $2
+function get_app_hosts() {
+  local app_name=$1
+  local -n app_hosts=$2
+  if [[ "$app_name" == "manager" ]]; then
+    app_hosts=( $MANAGER_HOSTS )
+  fi
+
+  if [[ "$app_name" == "tserver" || "$app_name" == "datanode" ]]; then
+    app_hosts=( $TSERVER_HOSTS )
   fi

Review comment:
       The corresponding change here to my suggestion above could be:
   
   ```suggestion
     case "$1" in
       manager)  echo -n "$MANAGER_HOSTS" ;;
       tserver|datanode) echo -n "$TSERVER_HOSTS" ;;
       *) return 1 ;;
     esac
   ```
   
   The `-n` is so you don't capture a newline as part of the last host when `mapfile` reads it using a space delimiter.

##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array
+  get_app_hosts "$app_name" hosts_array
+  local num_hosts=${#hosts_array[@]}
+  local node_to_wack
+  declare -a nodes_to_wack_array

Review comment:
       My personal preference for declaring arrays is the `variable=()` syntax, rather than the `declare -a variable` syntax, only because I think the former is more obvious. The former seems modern to me and more consistent with how it is assigned/appended, and the latter seems dated. Also, `declare` makes local variables by default when used inside a function but global outside, which can be confusing, but it's obvious when doing `local variable=()`. Anyway, this is probably just personal preference.
   
   FWIW, "whack" is misspelled :smiley_cat: 

##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array
+  get_app_hosts "$app_name" hosts_array

Review comment:
       The use of a nameref in `get_app_hosts` is clever, but I think it's more complicated than necessary. It's much simpler to `echo` the hosts in `get_app_hosts` and simply capture that output into an array here:
   
   ```bash
     local hosts_array; hosts_array=($(get_app_hosts "$app_name"))
   ```
   
   This would be similar to Java:
   ```java
   String[] hostsArray = getAppHosts(appName).split(" ");
   ```
   
   Shellcheck won't like not quoting the output, even though that's what you want here. So the best way is to explicitly specify the delimiter when constructing the array, using `mapfile` (aka `readarray`):
   
   ```suggestion
     local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783040965



##########
File path: bin/agitator
##########
@@ -30,6 +30,80 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation $kill_sleep_time $restart_sleep_time $min_kill $max_kill for ${hosts_array[*]}"

Review comment:
       I thought about that but I didn't want to make it too long. I'll make it more descriptive. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783198082



##########
File path: bin/agitator
##########
@@ -40,60 +115,84 @@ function start_agitator() {
   else
     echo >&2 "The agitator requires pssh/parallel-ssh to be installed. Aborting."; exit 1;
   fi
+  ## read configuration into env variables
+  read_cluster_conf
 
   mkdir -p "${at_home}/logs"
   log_base="${at_home}/logs/$(date +%Y%m%d%H%M%S)_$(hostname)"
-  libexec="${at_home}/libexec"
-  master_log="${log_base}_master-agitator"
-  tserver_log="${log_base}_tserver-agitator"
-  datanode_log="${log_base}_datanode-agitator"
-  master_cmd="nohup ${libexec}/master-agitator.pl $AGTR_MASTER_KILL_SLEEP_TIME $AGTR_MASTER_RESTART_SLEEP_TIME"
-  tserver_cmd="nohup ${libexec}/tserver-agitator.pl $AGTR_TSERVER_KILL_SLEEP_TIME $AGTR_TSERVER_RESTART_SLEEP_TIME $AGTR_TSERVER_MIN_KILL $AGTR_TSERVER_MAX_KILL"
-  datanode_cmd="nohup ${libexec}/datanode-agitator.pl $AGTR_DATANODE_KILL_SLEEP_TIME $AGTR_DATANODE_RESTART_SLEEP_TIME $HADOOP_HOME $AGTR_DATANODE_MIN_KILL $AGTR_DATANODE_MAX_KILL"
+  manager_log="${log_base}_manager-agitator.log"
+  tserver_log="${log_base}_tserver-agitator.log"
+  datanode_log="${log_base}_datanode-agitator.log"
+  datanode_kill_cmd="pkill -9 -f '[p]roc_datanode'"
+  datanode_start_cmd="$HADOOP_HOME/bin/hdfs --daemon start datanode"
+  manager_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main manager'"
+  manager_start_cmd="$ACCUMULO_HOME/bin/accumulo-service manager start"
+  tserver_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main tserver'"
+  tserver_start_cmd="$ACCUMULO_HOME/bin/accumulo-service tserver start"
+
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
 
-  if [[ $AGITATOR_USER == "$AGTR_ACCUMULO_USER" ]]; then
-    echo "Running master-agitator and tserver-agitator as $AGITATOR_USER"
-    $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
-  else
-    echo "Running master-agitator and tserver-agitator as $AGTR_ACCUMULO_USER using sudo."
-    sudo --preserve-env=PSSH -u "$AGTR_ACCUMULO_USER" $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    sudo -u "$AGTR_ACCUMULO_USER" $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
+  if [[ $AGITATOR_USER != "$AGTR_ACCUMULO_USER" ]]; then
+    sudo -i -u "$AGTR_ACCUMULO_USER"
   fi
-  if [[ $AGITATOR_USER == "$AGTR_HDFS_USER" ]]; then
-    echo "Running datanode-agitator as $AGITATOR_USER"
-    $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
-  else
-    echo "Running datanode-agitator as $AGTR_HDFS_USER using sudo."
-    sudo -u "$AGTR_HDFS_USER" $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
+  echo "Starting manager and tserver agitation as $(whoami)"
+  start_app_agitator manager "$AGTR_MASTER_KILL_SLEEP_TIME" "$AGTR_MASTER_RESTART_SLEEP_TIME" 1 1 "$manager_start_cmd" "$manager_kill_cmd" > "$manager_log" 2>&1 &

Review comment:
       Yeah I haven't changed them in `env.sh` because I thought it might be a bigger change to migrate away from master. I was going to do that in a separate change but I can here since I am modifying both files. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783201333



##########
File path: bin/agitator
##########
@@ -40,60 +115,84 @@ function start_agitator() {
   else
     echo >&2 "The agitator requires pssh/parallel-ssh to be installed. Aborting."; exit 1;
   fi
+  ## read configuration into env variables
+  read_cluster_conf
 
   mkdir -p "${at_home}/logs"
   log_base="${at_home}/logs/$(date +%Y%m%d%H%M%S)_$(hostname)"
-  libexec="${at_home}/libexec"
-  master_log="${log_base}_master-agitator"
-  tserver_log="${log_base}_tserver-agitator"
-  datanode_log="${log_base}_datanode-agitator"
-  master_cmd="nohup ${libexec}/master-agitator.pl $AGTR_MASTER_KILL_SLEEP_TIME $AGTR_MASTER_RESTART_SLEEP_TIME"
-  tserver_cmd="nohup ${libexec}/tserver-agitator.pl $AGTR_TSERVER_KILL_SLEEP_TIME $AGTR_TSERVER_RESTART_SLEEP_TIME $AGTR_TSERVER_MIN_KILL $AGTR_TSERVER_MAX_KILL"
-  datanode_cmd="nohup ${libexec}/datanode-agitator.pl $AGTR_DATANODE_KILL_SLEEP_TIME $AGTR_DATANODE_RESTART_SLEEP_TIME $HADOOP_HOME $AGTR_DATANODE_MIN_KILL $AGTR_DATANODE_MAX_KILL"
+  manager_log="${log_base}_manager-agitator.log"
+  tserver_log="${log_base}_tserver-agitator.log"
+  datanode_log="${log_base}_datanode-agitator.log"
+  datanode_kill_cmd="pkill -9 -f '[p]roc_datanode'"
+  datanode_start_cmd="$HADOOP_HOME/bin/hdfs --daemon start datanode"
+  manager_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main manager'"
+  manager_start_cmd="$ACCUMULO_HOME/bin/accumulo-service manager start"
+  tserver_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main tserver'"
+  tserver_start_cmd="$ACCUMULO_HOME/bin/accumulo-service tserver start"
+
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
 
-  if [[ $AGITATOR_USER == "$AGTR_ACCUMULO_USER" ]]; then
-    echo "Running master-agitator and tserver-agitator as $AGITATOR_USER"
-    $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
-  else
-    echo "Running master-agitator and tserver-agitator as $AGTR_ACCUMULO_USER using sudo."
-    sudo --preserve-env=PSSH -u "$AGTR_ACCUMULO_USER" $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    sudo -u "$AGTR_ACCUMULO_USER" $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
+  if [[ $AGITATOR_USER != "$AGTR_ACCUMULO_USER" ]]; then
+    sudo -i -u "$AGTR_ACCUMULO_USER"
   fi
-  if [[ $AGITATOR_USER == "$AGTR_HDFS_USER" ]]; then
-    echo "Running datanode-agitator as $AGITATOR_USER"
-    $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
-  else
-    echo "Running datanode-agitator as $AGTR_HDFS_USER using sudo."
-    sudo -u "$AGTR_HDFS_USER" $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
+  echo "Starting manager and tserver agitation as $(whoami)"
+  start_app_agitator manager "$AGTR_MASTER_KILL_SLEEP_TIME" "$AGTR_MASTER_RESTART_SLEEP_TIME" 1 1 "$manager_start_cmd" "$manager_kill_cmd" > "$manager_log" 2>&1 &

Review comment:
       > I was going to do that in a separate change but I can here since I am modifying both files.
   
   Either way works. I just wanted to double check that it wasn't being overlooked.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1007692474


   Here is some documentation on the nameref option. I had never used it before, pretty cool.
   
   <pre>
   A variable can be assigned the nameref attribute using the -n option to the declare or local builtin commands (see Bash 
   Builtins) to create a nameref, or a reference to another variable. This allows variables to be manipulated indirectly. Whenever 
   the nameref variable is referenced, assigned to, unset, or has its attributes modified (other than using or changing the nameref 
   attribute itself), the operation is actually performed on the variable specified by the nameref variable’s value.
   </pre>
   
   https://www.gnu.org/software/bash/manual/html_node/Shell-Parameters.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] ctubbsii commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780462956



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       They will, but that's okay in the example I gave, because I'm basically passing them as varargs. You can either pass them as a single, quoted value with space separated items, and then put that into an array later:
   
   ```bash
   function printHosts() {
     local myarray=($1) # make the single argument turn into an array by not quoting so the spaces create separate items
     local x; for x in "${myarray[@]}"; do echo "$x"; done
   }
   myhosts="host1 host2 host3"
   printHosts "$myhosts" # quotes pass it as single arg
   ```
   
   Or, you can break them out like varargs (which is what I did above and in both the following examples):
   
   Varargs example using an unquoted space-delimited scalar:
   
   ```bash
   function printHosts() {
     local myarray=("$@") # put args into a new named array
     local x; for x in "${myarray[@]}"; do echo "$x"; done
   }
   myhosts="host1 host2 host3"
   printHosts $myhosts # no quotes, so pass it as 3 args
   ```
   
   Varargs example using an initial array where each element in the original array is passed as a separate argument:
   
   ```bash
   function printHosts() {
     local myarray=("$@") # put args into a new named array
     local x; for x in "${myarray[@]}"; do echo "$x"; done
   }
   myhosts=("host1" "host2" "host3") # already an array
   printHosts "${myhosts[@]}" # pass each element of the array as a separate arg
   ```
   
   (Note: all of these examples are simplified, to avoid the `shift` that picks off the arguments before the varargs hosts.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780283232



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5

Review comment:
       What are the benefits of using shift 6 times? Personally I think this is a lot less user friendly then using the numbers for the parameters. If someone comes and looks at the script to figure out the params they have to do counting and also the shifting makes the ordering of setting the variables important.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1006879195


   FYI I am currently working through `shellcheck` recommendations so please hold off on any of that type of feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783051921



##########
File path: bin/agitator
##########
@@ -30,6 +30,80 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation $kill_sleep_time $restart_sleep_time $min_kill $max_kill for ${hosts_array[*]}"
+  while true; do
+    echo "$T Sleeping for $kill_sleep_time minutes"
+    sleep $((kill_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    if ((max_kill == 1)) ; then
+      node_to_kill=${hosts_array[0]}
+      echo "$T Killing $app_name at $node_to_kill"
+      ssh "$node_to_kill" "$kill_cmd"
+    else
+      local num_to_kill=$((min_kill + RANDOM % max_kill))
+      # get the random nodes to kill
+      local count=0
+      while [[ $count -lt $num_to_kill ]]; do
+        randomHostIndex=$((1 + RANDOM % num_hosts))
+        node_to_kill=${hosts_array[randomHostIndex]}
+        # only add host to the array if its not already there
+        if [[ ! " ${nodes_to_kill_array[*]} " =~ $node_to_kill ]]; then
+          nodes_to_kill_array[count]=$node_to_kill
+        fi
+        count=${#nodes_to_kill_array[@]}
+      done
+      echo "$T Killing $count $app_name nodes"
+      for i in "${nodes_to_kill_array[@]}"; do
+        ssh "$i" "$kill_cmd"
+      done
+    fi
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    echo "$T Sleeping for $restart_sleep_time minutes."
+    sleep $((restart_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    if ((max_kill == 1)) ; then
+      echo "$T Restarting $app_name at $node_to_kill"
+      ssh "$node_to_kill" "bash -c '${ENV_VARS} $start_cmd'"
+    else
+      for i in "${nodes_to_kill_array[@]}"; do
+        echo "$T Restarting $app_name node at ${i}"
+        ssh "$i" "bash -c '${ENV_VARS} $start_cmd'"
+      done
+    fi
+  done
+}
+
 function start_agitator() {
   ## check that pssh is installed, falling back to parallel-ssh if needed
   ## make sure to export it, so it can be seen inside the agitator perl script

Review comment:
       Actually, there is still the one perl script that is still being used `hdfs-agitator.pl`. AFAIK its still working. I didn't see any errors but was unable to fully test just locally on my machine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] ctubbsii commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1007794423


   > Here is some documentation on the nameref option. I had never used it before, pretty cool.
   
   I'm not familiar with the nameref flag. It looks a bit like setting pointers, which seems... harder to grok and more prone to error. Also, bash docs say you can't use it with arrays. I haven't had a chance to look at your updates yet, to see how you're using it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780257738



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       The `$APP_HOSTS` variable contains a string of values separated by a space so I was concerned they would expand as separated param if I passed them as a variable. I will test this out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780257738



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       The `$APP_HOSTS` variable contains a string of values separated by a space so I was concerned they would expain as separated param if I passed them as a variable. I will test this out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783179163



##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then

Review comment:
       Maybe a check that max >= min could be added too. Either to this condition or separate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime merged pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime merged pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] ctubbsii commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r779881163



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5
+  ENV_VARS="ACCUMULO_HOME=${ACCUMULO_HOME} ZOOKEEPER_HOME=${ZOOKEEPER_HOME} HADOOP_HOME=${HADOOP_HOME} JAVA_HOME=${JAVA_HOME}"
+  if [[ -z $APP_HOSTS ]]; then
+    echo "ERROR: No hosts were found in env for ${app_name}"
+    exit 1
+  fi

Review comment:
       Based on my other suggestions, this could just check the size of the array

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5
+  ENV_VARS="ACCUMULO_HOME=${ACCUMULO_HOME} ZOOKEEPER_HOME=${ZOOKEEPER_HOME} HADOOP_HOME=${HADOOP_HOME} JAVA_HOME=${JAVA_HOME}"
+  if [[ -z $APP_HOSTS ]]; then
+    echo "ERROR: No hosts were found in env for ${app_name}"
+    exit 1
+  fi
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "${T} Running ${app_name} agitator ${kill_sleep_time} ${restart_sleep_time} ${min_kill} ${max_kill} for HOSTS=${APP_HOSTS}"
+  while true; do
+    echo "${T} Sleeping for ${kill_sleep_time} minutes"
+    sleep $((kill_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    numToWack=$((min_kill + RANDOM % max_kill))
+    nodeToWack=$(echo "${APP_HOSTS}" | cut -d" " -f1)
+    hostsArray=("${APP_HOSTS}")
+    numOfHosts=${#hostsArray[@]}
+    declare -a nodesToWack
+
+    if ((max_kill == 1)) ; then
+      echo "${T} Killing ${app_name}"
+      ssh "${nodeToWack}" "pkill -f '[ ]org.apache.accumulo.start.*${app_name}'"
+    else
+      # get the random nodes to kill
+      count=0
+      while [ $count -lt $numToWack ]; do

Review comment:
       Should use `[[ ]]` instead of `[ ]` to avoid any possible performance hit by calling `/usr/bin/[`, or any possible confusion because it looks like that's what it's doing. `[[ ]]` is built-in, but `[ ]` is a standalone executable that bash may or may not execute or replace with its own built-in interpretation (just like /usr/bin/echo).

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &

Review comment:
       These first three lines are duplicated with different parameters for the managers and the tservers. These can be passed as parameters to the `start_app_agitator` function instead.
   
   Also, if at all possible, you should avoid passing around variables in and out of functions using global variables. This is especially important since you're backgrounding the calls and then immediately editing the value of the global variable, which could create concurrency issues. You can declare variables local to a function with the `local` keyword, as in `local x; x=$1` to assign the first argument to the function to the local variable named `x`.
   
   Also, you don't need all the curly braces for scalar variables that aren't ambiguous in a string. They can add noise that affect readability.
   
   ```suggestion
     local agtr_user=$1
     start_app_agitator manager "$agtr_user" "$AGTR_MASTER_KILL_SLEEP_TIME" "$AGTR_MASTER_RESTART_SLEEP_TIME" 1 1 $MANAGER_HOSTS > "$manager_log" 2>&1 &
     start_app_agitator tserver "$agtr_user" "$AGTR_TSERVER_KILL_SLEEP_TIME" "$AGTR_TSERVER_RESTART_SLEEP_TIME" "$AGTR_TSERVER_MIN_KILL" "$AGTR_TSERVER_MAX_KILL" $TSERVER_HOSTS > "$tserver_log" 2>&1 &
   ```

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5
+  ENV_VARS="ACCUMULO_HOME=${ACCUMULO_HOME} ZOOKEEPER_HOME=${ZOOKEEPER_HOME} HADOOP_HOME=${HADOOP_HOME} JAVA_HOME=${JAVA_HOME}"
+  if [[ -z $APP_HOSTS ]]; then
+    echo "ERROR: No hosts were found in env for ${app_name}"
+    exit 1
+  fi
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "${T} Running ${app_name} agitator ${kill_sleep_time} ${restart_sleep_time} ${min_kill} ${max_kill} for HOSTS=${APP_HOSTS}"
+  while true; do
+    echo "${T} Sleeping for ${kill_sleep_time} minutes"
+    sleep $((kill_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    numToWack=$((min_kill + RANDOM % max_kill))
+    nodeToWack=$(echo "${APP_HOSTS}" | cut -d" " -f1)
+    hostsArray=("${APP_HOSTS}")

Review comment:
       don't forget to use local variables for anything you don't think should be set outside this function; if assigning a simple value to another local variable, you can do `local x=$y`. For arrays, you can do `local x=("${y[@]}")`. For assigning the result of an operation, you have to declare as local and assign it in two steps: `local x; x=$(echo y)`
   
   My previous suggestion already set APP_HOSTS to an array from the function's input arguments, so that could simplify some of this array logic in here.

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5

Review comment:
       ```suggestion
   # Requires that a list of hosts to be set in $APP_HOSTS
   function start_app_agitator() {
     local app_name=$1; shift
     local agtr_user=$1; shift
     local kill_sleep_time=$1; shift
     local restart_sleep_time=$1; shift
     local min_kill=$1; shift
     local max_kill=$1; shift
     local APP_HOSTS=("$@") # can keep this as an array
     echo "Starting $app_name agitation as $agtr_user for ${APP_HOSTS[*]}"
   ```

##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5
+  ENV_VARS="ACCUMULO_HOME=${ACCUMULO_HOME} ZOOKEEPER_HOME=${ZOOKEEPER_HOME} HADOOP_HOME=${HADOOP_HOME} JAVA_HOME=${JAVA_HOME}"
+  if [[ -z $APP_HOSTS ]]; then
+    echo "ERROR: No hosts were found in env for ${app_name}"
+    exit 1
+  fi
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "${T} Running ${app_name} agitator ${kill_sleep_time} ${restart_sleep_time} ${min_kill} ${max_kill} for HOSTS=${APP_HOSTS}"
+  while true; do
+    echo "${T} Sleeping for ${kill_sleep_time} minutes"
+    sleep $((kill_sleep_time * 60))
+
+    T="$(date +'%Y%m%d %H:%M:%S')"
+    numToWack=$((min_kill + RANDOM % max_kill))
+    nodeToWack=$(echo "${APP_HOSTS}" | cut -d" " -f1)
+    hostsArray=("${APP_HOSTS}")
+    numOfHosts=${#hostsArray[@]}
+    declare -a nodesToWack
+
+    if ((max_kill == 1)) ; then
+      echo "${T} Killing ${app_name}"
+      ssh "${nodeToWack}" "pkill -f '[ ]org.apache.accumulo.start.*${app_name}'"
+    else
+      # get the random nodes to kill
+      count=0
+      while [ $count -lt $numToWack ]; do
+        randomHostIndex=$((1 + RANDOM % numOfHosts))
+        nodeToWack=${hostsArray[randomHostIndex]}
+        # only add host to the array if its not already there
+        if [[ ! " ${nodesToWack[*]} " =~ ${nodeToWack} ]]; then
+          nodesToWack[count]=${nodeToWack}
+        fi
+        count=${#nodesToWack[@]}
+      done
+      echo "${T} Killing ${count} ${app_name} nodes"
+      for i in "${nodesToWack[@]}"; do
+        ssh "${i}" "pkill -f '[ ]org.apache.accumulo.start.*${app_name}'"

Review comment:
       Could make this pattern be a little more specific (might not be correct):
   ```suggestion
           ssh "${i}" "pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main ${app_name}'"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] ctubbsii commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780587670



##########
File path: bin/agitator
##########
@@ -40,57 +115,85 @@ function start_agitator() {
   else
     echo >&2 "The agitator requires pssh/parallel-ssh to be installed. Aborting."; exit 1;
   fi
+  ## read configuration into env variables
+  read_cluster_conf
 
   mkdir -p "${at_home}/logs"
   log_base="${at_home}/logs/$(date +%Y%m%d%H%M%S)_$(hostname)"
-  libexec="${at_home}/libexec"
-  master_log="${log_base}_master-agitator"
-  tserver_log="${log_base}_tserver-agitator"
-  datanode_log="${log_base}_datanode-agitator"
-  master_cmd="nohup ${libexec}/master-agitator.pl $AGTR_MASTER_KILL_SLEEP_TIME $AGTR_MASTER_RESTART_SLEEP_TIME"
-  tserver_cmd="nohup ${libexec}/tserver-agitator.pl $AGTR_TSERVER_KILL_SLEEP_TIME $AGTR_TSERVER_RESTART_SLEEP_TIME $AGTR_TSERVER_MIN_KILL $AGTR_TSERVER_MAX_KILL"
-  datanode_cmd="nohup ${libexec}/datanode-agitator.pl $AGTR_DATANODE_KILL_SLEEP_TIME $AGTR_DATANODE_RESTART_SLEEP_TIME $HADOOP_HOME $AGTR_DATANODE_MIN_KILL $AGTR_DATANODE_MAX_KILL"
+  manager_log="${log_base}_manager-agitator.log"
+  tserver_log="${log_base}_tserver-agitator.log"
+  datanode_log="${log_base}_datanode-agitator.log"
+  datanode_kill_cmd="pkill -9 -f '[p]roc_datanode'"
+  datanode_start_cmd="$HADOOP_HOME/bin/hdfs --daemon start datanode"
+  manager_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main manager'"
+  manager_start_cmd="$ACCUMULO_HOME/bin/accumulo-service manager start"
+  tserver_kill_cmd="pkill -f '[ ]org[.]apache[.]accumulo[.]start[.]Main tserver'"
+  tserver_start_cmd="$ACCUMULO_HOME/bin/accumulo-service tserver start"
+
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
 
-  if [[ $AGITATOR_USER == "$AGTR_ACCUMULO_USER" ]]; then
-    echo "Running master-agitator and tserver-agitator as $AGITATOR_USER"
-    $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
-  else
-    echo "Running master-agitator and tserver-agitator as $AGTR_ACCUMULO_USER using sudo."
-    sudo --preserve-env=PSSH -u "$AGTR_ACCUMULO_USER" $master_cmd > "${master_log}.out" 2> "${master_log}.err" &
-    sudo -u "$AGTR_ACCUMULO_USER" $tserver_cmd > "${tserver_log}.out" 2> "${tserver_log}.err" &
+  if [[ $AGITATOR_USER != "$AGTR_ACCUMULO_USER" ]]; then
+    sudo -i -u "$AGTR_ACCUMULO_USER"
   fi
-  if [[ $AGITATOR_USER == "$AGTR_HDFS_USER" ]]; then
-    echo "Running datanode-agitator as $AGITATOR_USER"
-    $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
-  else
-    echo "Running datanode-agitator as $AGTR_HDFS_USER using sudo."
-    sudo -u "$AGTR_HDFS_USER" $datanode_cmd > "${datanode_log}.out" 2> "${datanode_log}.err" &
+  echo "Starting manager and tserver agitation as $(whoami)"
+  start_app_agitator manager "$AGTR_MASTER_KILL_SLEEP_TIME" "$AGTR_MASTER_RESTART_SLEEP_TIME" 1 1 "$manager_start_cmd" "$manager_kill_cmd" > "$manager_log" 2>&1 &
+  start_app_agitator tserver "$AGTR_TSERVER_KILL_SLEEP_TIME" "$AGTR_TSERVER_RESTART_SLEEP_TIME" "$AGTR_TSERVER_MIN_KILL" "$AGTR_TSERVER_MAX_KILL" "$tserver_start_cmd" "$tserver_kill_cmd" > "$tserver_log" 2>&1 &
+
+  if [[ $AGITATOR_USER != "$AGTR_HDFS_USER" ]]; then
+    sudo -i -u "$AGTR_HDFS_USER"
   fi
+  echo "Running datanode agitator as $(whoami)"
+  start_app_agitator datanode "$AGTR_DATANODE_KILL_SLEEP_TIME" "$AGTR_DATANODE_RESTART_SLEEP_TIME" "$AGTR_DATANODE_MIN_KILL" "$AGTR_DATANODE_MAX_KILL" "$datanode_start_cmd" "$datanode_kill_cmd" > "${datanode_log}" 2>&1 &
 
   if ${AGTR_HDFS:-false} ; then
     agitator_log=${log_base}_hdfs-agitator
-    sudo -u "$AGTR_HDFS_SUPERUSER" nohup "${libexec}/hdfs-agitator.pl" --sleep "${AGTR_HDFS_SLEEP_TIME}" --hdfs-cmd "${AGTR_HDFS_COMMAND}" --superuser "${AGTR_HDFS_SUPERUSER}" >"${agitator_log}.out" 2>"${agitator_log}.err" &
+    sudo -u "$AGTR_HDFS_SUPERUSER" nohup "${at_home}/libexec/hdfs-agitator.pl" --sleep "${AGTR_HDFS_SLEEP_TIME}" --hdfs-cmd "${AGTR_HDFS_COMMAND}" --superuser "${AGTR_HDFS_SUPERUSER}" >"${agitator_log}.out" 2>"${agitator_log}.err" &
   fi
 }
 
 function stop_agitator() {
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
-  if [[ $AGITATOR_USER == "$AGTR_ACCUMULO_USER" ]];  then
-    echo "Stopping all processes matching 'datanode-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f datanode-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'hdfs-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f hdfs-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'agitator.pl' as $AGITATOR_USER"
-    pkill -f agitator.pl 2>/dev/null 2>/dev/null
-  else
-    echo "Stopping all processes matching 'datanode-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f datanode-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'hdfs-agitator.pl' as $AGTR_HDFS_USER"
-    sudo -u "$AGTR_HDFS_USER" pkill -f hdfs-agitator.pl 2>/dev/null
-    echo "Stopping all processes matching 'agitator.pl' as $AGTR_ACCUMULO_USER"
-    sudo -u "$AGTR_ACCUMULO_USER" pkill -f agitator.pl 2>/dev/null
+  echo "Stopping all processes matching 'sleep' as $AGITATOR_USER"
+  pkill -f sleep 2>/dev/null
+  echo "Stopping all processes matching 'agitator' as $AGITATOR_USER"
+  pkill -f agitator 2>/dev/null
+}
+
+function parse_fail() {
+  echo "Failed to parse ${conf}/cluster.yaml"
+  exit 1
+}
+
+# Read the configuration from cluster.yaml - expects at least the manager and tservers
+function read_cluster_conf() {
+  conf="${ACCUMULO_HOME}/conf"
+  echo "Reading cluster config from ${conf}/cluster.yaml"
+  trap 'rm -f "$CONFIG_FILE"' EXIT
+  CONFIG_FILE=$(mktemp) || exit 1
+  accumulo org.apache.accumulo.core.conf.cluster.ClusterConfigParser "${conf}"/cluster.yaml > "$CONFIG_FILE" || parse_fail
+  . "$CONFIG_FILE"
+  rm -f "$CONFIG_FILE"
+
+  if [[ -z $MANAGER_HOSTS ]]; then
+    echo "ERROR: managers not found in ${conf}/cluster.yaml"
+    exit 1
+  fi
+  if [[ -z $TSERVER_HOSTS ]]; then
+    echo "ERROR: tservers not found in ${conf}/cluster.yaml"
+    exit 1
+  fi
+}
+
+# Given an app_name $1, set the array of hosts to the named reference passed in $2
+function get_app_hosts() {
+  local app_name=$1
+  local -n app_hosts=$2
+  if [[ "$app_name" == "manager" ]]; then
+    app_hosts=( $MANAGER_HOSTS )
+  fi
+
+  if [[ "$app_name" == "tserver" || "$app_name" == "datanode" ]]; then
+    app_hosts=( $TSERVER_HOSTS )
   fi

Review comment:
       The corresponding change here to my suggestion above could be:
   
   ```suggestion
     case "$1" in
       manager)  echo -n "$MANAGER_HOSTS" ;;
       tserver|datanode) echo -n "$TSERVER_HOSTS" ;;
       *) return 1 ;;
     esac
   ```
   
   The `-n` is so you don't capture a newline as part of the last host when `mapfile` reads it using a space delimiter.

##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array
+  get_app_hosts "$app_name" hosts_array
+  local num_hosts=${#hosts_array[@]}
+  local node_to_wack
+  declare -a nodes_to_wack_array

Review comment:
       My personal preference for declaring arrays is the `variable=()` syntax, rather than the `declare -a variable` syntax, only because I think the former is more obvious. The former seems modern to me and more consistent with how it is assigned/appended, and the latter seems dated. Also, `declare` makes local variables by default when used inside a function but global outside, which can be confusing, but it's obvious when doing `local variable=()`. Anyway, this is probably just personal preference.
   
   FWIW, "whack" is misspelled :smiley_cat: 

##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array
+  get_app_hosts "$app_name" hosts_array

Review comment:
       The use of a nameref in `get_app_hosts` is clever, but I think it's more complicated than necessary. It's much simpler to `echo` the hosts in `get_app_hosts` and simply capture that output into an array here:
   
   ```bash
     local hosts_array; hosts_array=($(get_app_hosts "$app_name"))
   ```
   
   This would be similar to Java:
   ```java
   String[] hostsArray = getAppHosts(appName).split(" ");
   ```
   
   Shellcheck won't like not quoting the output, even though that's what you want here. So the best way is to explicitly specify the delimiter when constructing the array, using `mapfile` (aka `readarray`):
   
   ```suggestion
     local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1007692474


   Here is some documentation on the nameref option. I had never used it before, pretty cool.
   
   <pre>
   A variable can be assigned the nameref attribute using the -n option to the declare or local builtin commands (see Bash 
   Builtins) to create a nameref, or a reference to another variable. This allows variables to be manipulated indirectly. Whenever 
   the nameref variable is referenced, assigned to, unset, or has its attributes modified (other than using or changing the nameref 
   attribute itself), the operation is actually performed on the variable specified by the nameref variable’s value.
   </pre>
   
   https://www.gnu.org/software/bash/manual/html_node/Shell-Parameters.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r780287516



##########
File path: bin/agitator
##########
@@ -30,6 +30,144 @@ Possible commands:
 EOF
 }
 
+function run_agitators() {
+  unset APP_HOSTS
+  export APP_HOSTS=$MANAGER_HOSTS
+  echo "Starting manager agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "manager" "${AGTR_MASTER_KILL_SLEEP_TIME}" "${AGTR_MASTER_RESTART_SLEEP_TIME}" 1 1 > "${manager_log}" 2>&1 &
+  unset APP_HOSTS
+  export APP_HOSTS=$TSERVER_HOSTS
+  echo "Starting tserver agitation as ${1} for ${APP_HOSTS}"
+  start_app_agitator "tserver" "${AGTR_TSERVER_KILL_SLEEP_TIME}" "${AGTR_TSERVER_RESTART_SLEEP_TIME}" "${AGTR_TSERVER_MIN_KILL}" "${AGTR_TSERVER_MAX_KILL}" > "${tserver_log}" 2>&1 &
+}
+
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  app_name=$1
+  kill_sleep_time=$2
+  restart_sleep_time=$3
+  min_kill=$4
+  max_kill=$5

Review comment:
       I see why you are shifting; so you can set the last one as an array. I think I am going to create another function to set APP_HOSTS so this is more readable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] EdColeman commented on pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
EdColeman commented on pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#issuecomment-1008965031


   This is an interesting thread - and I'm asking because I'm curious, not because I disagree with any of the comments or suggestions.  Is there a general style guide / best practices that we can point to?  Right now, if I needed to submit / change a bash script I would be left to find one in our repo and try to match those conventions. Being the scripts might not be uniform because they were written at different points, by different authors and have evolved separately - my odds of picking a "good" example seems less than assured. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r783075897



##########
File path: bin/agitator
##########
@@ -30,6 +30,80 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array; readarray -td' ' hosts_array < <(get_app_hosts "$app_name")
+  local num_hosts=${#hosts_array[@]}
+  local node_to_kill
+  nodes_to_kill_array=()
+  local T
+  local ENV_VARS="ACCUMULO_HOME=$ACCUMULO_HOME ZOOKEEPER_HOME=$ZOOKEEPER_HOME HADOOP_HOME=$HADOOP_HOME JAVA_HOME=$JAVA_HOME"
+
+  if (( num_hosts == 0 )); then
+    echo "ERROR: No hosts were found in env for $app_name"
+    exit 1
+  fi
+  if (( max_kill > num_hosts )); then
+      echo "ERROR: Max kill $max_kill greater then number of hosts $num_hosts"
+      exit 1
+    fi
+
+  T="$(date +'%Y%m%d %H:%M:%S')"
+  echo "$T Starting $app_name agitation $kill_sleep_time $restart_sleep_time $min_kill $max_kill for ${hosts_array[*]}"

Review comment:
       Check out 2bf1872 and let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] milleruntime commented on a change in pull request #184: Fix broken agitator scripts

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #184:
URL: https://github.com/apache/accumulo-testing/pull/184#discussion_r781412071



##########
File path: bin/agitator
##########
@@ -30,6 +30,81 @@ Possible commands:
 EOF
 }
 
+# Starts a app specific agitator
+# usage: start_app_agitator app_name kill_sleep_time restart_sleep_time min_kill max_kill start_cmd kill_cmd
+# Requires that a list of hosts to be set in $APP_HOSTS
+function start_app_agitator() {
+  local app_name=$1
+  local kill_sleep_time=$2
+  local restart_sleep_time=$3
+  local min_kill=$4
+  local max_kill=$5
+  local start_cmd=$6
+  local kill_cmd=$7
+  local hosts_array
+  get_app_hosts "$app_name" hosts_array
+  local num_hosts=${#hosts_array[@]}
+  local node_to_wack
+  declare -a nodes_to_wack_array

Review comment:
       Fixed in 2ab7f6e




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org