You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/01/13 19:06:00 UTC

[GitHub] [accumulo-testing] DomGarguilo opened a new pull request #187: Eliminate possible race condition in `agitator stop` script

DomGarguilo opened a new pull request #187:
URL: https://github.com/apache/accumulo-testing/pull/187


   This PR addresses a possible race condition that was identified below:
   
   > 
   > @DomGarguilo I noticed a quirk while testing that I think could be a follow on task. For the stop agitator, I made the `pkill` kill the sleep first and then anything matching `agitator`. This works but I noticed that the script may run before being killed and introduces a race condition. Here is the log from the tserver agitator:
   > <pre>
   > 20220112 12:37:19 Starting tserver agitation. Kill every 20 minutes, restart every 10 minutes.
   > 20220112 12:37:19 Will randomly kill between 1 and 1 of the following: localhost
   > 20220112 12:37:19 Sleeping for 20 minutes
   > Terminated
   > 20220112 12:49:54 Killing tserver at localhost
   > </pre>
   > 
   > There should be a way to kill the agitator process and all its child processes (i.e. the sleep function) with one command.
   
   _Originally posted by @milleruntime in https://github.com/apache/accumulo-testing/issues/184#issuecomment-1011324701_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #187: Eliminate possible race condition in `agitator stop` script

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #187:
URL: https://github.com/apache/accumulo-testing/pull/187#discussion_r785168367



##########
File path: bin/agitator
##########
@@ -157,10 +157,13 @@ function start_agitator() {
 
 function stop_agitator() {
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
-  echo "Stopping all processes matching 'sleep' as $AGITATOR_USER"
-  pkill -f sleep 2>/dev/null
-  echo "Stopping all processes matching 'agitator' as $AGITATOR_USER"
-  pkill -f agitator 2>/dev/null
+  echo "Stopping all processes in the same process group as 'agitator' as $AGITATOR_USER"
+  ## get process ids of agitator processes then just keep the first
+  agitator_pid=$(pgrep -f agitator | head -1)
+  ## get the group process id of the agitator process then clear whitespace
+  group_pid=$(ps -p "${agitator_pid}" -o pgid= | xargs)
+  ## kill all processes in process group, should be all agitators and their sleep processes
+  kill -- -"${group_pid}"

Review comment:
       Addressed in e768fc1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #187: Eliminate possible race condition in `agitator stop` script

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #187:
URL: https://github.com/apache/accumulo-testing/pull/187#discussion_r785168367



##########
File path: bin/agitator
##########
@@ -157,10 +157,13 @@ function start_agitator() {
 
 function stop_agitator() {
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
-  echo "Stopping all processes matching 'sleep' as $AGITATOR_USER"
-  pkill -f sleep 2>/dev/null
-  echo "Stopping all processes matching 'agitator' as $AGITATOR_USER"
-  pkill -f agitator 2>/dev/null
+  echo "Stopping all processes in the same process group as 'agitator' as $AGITATOR_USER"
+  ## get process ids of agitator processes then just keep the first
+  agitator_pid=$(pgrep -f agitator | head -1)
+  ## get the group process id of the agitator process then clear whitespace
+  group_pid=$(ps -p "${agitator_pid}" -o pgid= | xargs)
+  ## kill all processes in process group, should be all agitators and their sleep processes
+  kill -- -"${group_pid}"

Review comment:
       Addresses in e768fc1




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on pull request #187: Eliminate possible race condition in `agitator stop` script

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on pull request #187:
URL: https://github.com/apache/accumulo-testing/pull/187#issuecomment-1012517542


   > If this works as intended, then I'm fine with it.
   
   From my testing it works well but there might be something I'm overlooking. Essentially what happens is the process group ID is found (which is shared among all of the processes we want to kill) then the kill command is issued to that group.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] ctubbsii commented on a change in pull request #187: Eliminate possible race condition in `agitator stop` script

Posted by GitBox <gi...@apache.org>.
ctubbsii commented on a change in pull request #187:
URL: https://github.com/apache/accumulo-testing/pull/187#discussion_r785130004



##########
File path: bin/agitator
##########
@@ -157,10 +157,13 @@ function start_agitator() {
 
 function stop_agitator() {
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
-  echo "Stopping all processes matching 'sleep' as $AGITATOR_USER"
-  pkill -f sleep 2>/dev/null
-  echo "Stopping all processes matching 'agitator' as $AGITATOR_USER"
-  pkill -f agitator 2>/dev/null
+  echo "Stopping all processes in the same process group as 'agitator' as $AGITATOR_USER"
+  ## get process ids of agitator processes then just keep the first
+  agitator_pid=$(pgrep -f agitator | head -1)
+  ## get the group process id of the agitator process then clear whitespace
+  group_pid=$(ps -p "${agitator_pid}" -o pgid= | xargs)
+  ## kill all processes in process group, should be all agitators and their sleep processes
+  kill -- -"${group_pid}"

Review comment:
       I think this will work better if you want to make use of arrays to get all agitator processes
   
   ```suggestion
     ## get process ids of all agitator processes (return 1 if none found)
     local agitator_pids=(); agitator_pids=($(pgrep -f agitator)) || return 1
     ## get the group process ids of all the agitator processes
     local group_pids=(); group_pids=($(ps -o pgid= -p "${agitator_pids[@]}"))
     ## kill all processes in the process groups (should include agitators and their sleep processes)
     kill -- "${group_pids[@]/#/-}"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo commented on a change in pull request #187: Eliminate possible race condition in `agitator stop` script

Posted by GitBox <gi...@apache.org>.
DomGarguilo commented on a change in pull request #187:
URL: https://github.com/apache/accumulo-testing/pull/187#discussion_r785161700



##########
File path: bin/agitator
##########
@@ -157,10 +157,13 @@ function start_agitator() {
 
 function stop_agitator() {
   [[ -n $AGITATOR_USER ]] || AGITATOR_USER=$(whoami)
-  echo "Stopping all processes matching 'sleep' as $AGITATOR_USER"
-  pkill -f sleep 2>/dev/null
-  echo "Stopping all processes matching 'agitator' as $AGITATOR_USER"
-  pkill -f agitator 2>/dev/null
+  echo "Stopping all processes in the same process group as 'agitator' as $AGITATOR_USER"
+  ## get process ids of agitator processes then just keep the first
+  agitator_pid=$(pgrep -f agitator | head -1)
+  ## get the group process id of the agitator process then clear whitespace
+  group_pid=$(ps -p "${agitator_pid}" -o pgid= | xargs)
+  ## kill all processes in process group, should be all agitators and their sleep processes
+  kill -- -"${group_pid}"

Review comment:
       This looks like its working. I added some print statements to see what this was doing and everything checks out. The only thing I noticed was the `pgrep -f agitator` call is actually picking up the `pid` of the `./bin/agitator stop` bash script itself which ultimately is killed via its gpid further along in the script. Not sure if this could cause some issues or not. Maybe we could pgrep for the string "agitator start" to avoid this. I'll give that a try.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-testing] DomGarguilo merged pull request #187: Eliminate possible race condition in `agitator stop` script

Posted by GitBox <gi...@apache.org>.
DomGarguilo merged pull request #187:
URL: https://github.com/apache/accumulo-testing/pull/187


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org