You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@storm.apache.org by GitBox <gi...@apache.org> on 2020/06/29 21:08:07 UTC

[GitHub] [storm] Ethanlm opened a new pull request #3299: [STORM-3658] Avoid deadlock and race condition caused by shutdown hooks

Ethanlm opened a new pull request #3299:
URL: https://github.com/apache/storm/pull/3299


   ## What is the purpose of the change
   
   Details explained in https://issues.apache.org/jira/browse/STORM-3658.
   
   In short, the problem here is 
   1. there could be a race condition between registering the shutdown hooks and invoking shutdown hooks
   2. There is a possible deadlock between a thread like executor thread that invokes shutdown hook (when it invokes Runtime.getRuntime().exit) and the shutdown hook/thread itself that tries to terminate threads like the executor thread.
   
   ## How was the change tested
   
   1. The deadlock and race condition can be reproduced with a WordCountTopology whose WordCountBolt simply throws a RuntimeException in `prepare` method.  Use it to validate the change. The race condition and deadlock is not happening. 
   
   The log below shows shutdown hook waited for 100ms and continue. This avoids the deadlock.
   ```
   2020-06-29 20:33:44.237 o.a.s.e.ExecutorShutdown ShutdownHook-shutdownFunc [WARN] Thread Thread-15-count-executor[1, 1] is still alive (100 ms after interruption). Stop wait
   ing for it.
   ```
   
   2. The shutdown hooks are registered at very early stage (before any other threads are spawned). This is validated by looking at the logs and see `Adding shutdown hook with kill in 3 secs` message appears very early. And I added a bug in my test to print out thread dump right after registering the shutdown hook. The only alive threads at that point are
   ```
   Log4j2-TF-6-Scheduled-1
   Reference Handler
   Signal Dispatcher
   Finalizer
   main
   ```
   And also tested with the above topology and don't see race condition happen again (while it was very easy to happen before the code change)
   
   3. thread name change can be validated by the log
   ```
   2020-06-29 20:33:44.023 o.a.s.u.Utils ShutdownHook-sleepKill-3s [INFO] Halting after 3 seconds
   ....
   2020-06-29 20:33:44.040 o.a.s.d.w.Worker ShutdownHook-shutdownFunc [INFO] Shutting down worker wc4-15-1593462538 513216e5-17e1-421f-809e-47cefd6eb136-10.215.73.209 6702
   ...
   ```
   
   4. `workerstate` could be null if exception happens before `workerstate` object is initialized properly. This was tested by throwing an exception before `workerstate` construction is all done. And the modified shutdown method doesn't throw NPE.
   
   5. I noticed that logs like 
   ```
   Trigger any worker shutdown hooks
   Disconnecting from storm cluster state context
   Shut down worker
   ```
   don't show up in the worker log file with or without this code change, with a good or bad topology. That should be taken care of separately.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] Ethanlm commented on pull request #3299: [STORM-3658] Avoid deadlock and race condition caused by shutdown hooks

Posted by GitBox <gi...@apache.org>.
Ethanlm commented on pull request #3299:
URL: https://github.com/apache/storm/pull/3299#issuecomment-652507975


   Last commit re-organized shutdown method to avoid checking `workerState!=null` in many different places. Tested again.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [storm] Ethanlm merged pull request #3299: [STORM-3658] Avoid deadlock and race condition caused by shutdown hooks

Posted by GitBox <gi...@apache.org>.
Ethanlm merged pull request #3299:
URL: https://github.com/apache/storm/pull/3299


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org