You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "EdColeman (via GitHub)" <gi...@apache.org> on 2023/04/07 14:53:31 UTC

[GitHub] [accumulo] EdColeman opened a new issue, #3282: TableServer shutdown may not be working as expected.

EdColeman opened a new issue, #3282:
URL: https://github.com/apache/accumulo/issues/3282

   **Describe the bug**
   The shutdown code in TableServer may not be working as expected.  The code at https://github.com/apache/accumulo/blob/7f49667f1347ba446d0f8e31271df8ba7ee48397/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L871 is looping on a variable (shutdownComplete) that never seems to be updated to allow the loop to exit and the rest of the shutdown code a chance to run.
   
   **Versions (OS, Maven, Java, and others, as appropriate):**
    - Affected version(s) of this project: seems to have existed since 1.4
   
   **Expected behavior**
   On shutdown, either all of the code should get a chance to run - or things restructured so that it is expected that none of the code needs to run and the loop removed.
   
   We are probably handing the issues where the rest of the code never runs on recovery (assuming that it never runs now). Basically if shutdown and a `kill -9` are treated the same, then the rest of the shutdown code that is not running now may be unnecessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on issue #3282: TabletServer shutdown may not be working as expected.

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on issue #3282:
URL: https://github.com/apache/accumulo/issues/3282#issuecomment-1500901715

   I looked at ScanServer and it uses the same [serverStopRequested](https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/ScanServer.java#L373) variable but doesn't have the `shutdownComplete` flag. On shutdown it just tries to close stuff in a finally block but I assume has the same behavior as TabletServer where the finally doesn't really get executed since halt will just stop the JVM.
   
   As a test locally I took out the `shutdownComplete` flag and loop to see what would happen and none of the code that is after where the loop was is executed as the JVM shutsdown too fast when halt is called so doesn't seem like it provides much value even if we got rid of the broken loop. Also, I am wondering why we are using halt and not exit as halt won't ever call shutdown hooks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on issue #3282: TableServer shutdown may not be working as expected.

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on issue #3282:
URL: https://github.com/apache/accumulo/issues/3282#issuecomment-1500364214

   I investigated this too and this variable is never set and has been like this since as far back as the history goes in Github. So we are clearly handling shutdown in some other way. It's not super clear to me from the comments if this sleep here is even still relevant or necessary but since this has been like this for such a long time any change here would definitely need to be well tested.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on issue #3282: TableServer shutdown may not be working as expected.

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on issue #3282:
URL: https://github.com/apache/accumulo/issues/3282#issuecomment-1500672319

   This was bugging me so I looked into why shutdown works and it's the JVM is halted so it never even hits that loop to wait.
   
   
   1. The manager sends a shutdown command to the Tserver. The Tserver gets it and eventually inside the TabletClientHandler it makes a call to [Halt.halt()](https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletClientHandler.java#L1261) to initialize the JVM shutdown.
   2. The call to Halt.halt() executes a [runnable](https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/util/Halt.java#L60) which [sets](https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletClientHandler.java#L1264) the `serverStopRequested` to true so the loop inside of TabletServer breaks.
   3. Immediately after the runnable is finished halt continues and [halts](https://github.com/apache/accumulo/blob/main/core/src/main/java/org/apache/accumulo/core/util/Halt.java#L62) the JVM so the JVM shutsdown.
   
   The end result is the JVM has shutdown initialized at the same time so in my testing the JVM exited before the `shutdownComplete` block is ever hit so none of the code was even executed. I would assume in theory it could hit that [loop](https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L872) if the JVM took a while to shutdown. I guess in that guess it would just spin until the halt was finished.
   
   So it seems like based on the behavior (and the [comment](https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L865), although the comment is hard to understand) that this behavior of waiting on the JVM to halt is intentional so the JVM closes the other threads before that thread is exited. But if that's the case this is kind of a weird way to do it and not super clear and I also don't understand the point of all the other code that exists after the while loop as it won't ever be executed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on issue #3282: TabletServer shutdown may not be working as expected.

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on issue #3282:
URL: https://github.com/apache/accumulo/issues/3282#issuecomment-1500682192

   As a side note I am wondering if there is a Spotbugs rule for something like this. It was caught because Intellij picked up on the fact that the variable could be set to be final since it was never mutated and private so I would think spotbugs could detect the loop checking a variable that is never going to be true.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org