You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Jungtaek Lim (JIRA)" <ji...@apache.org> on 2016/11/23 10:05:58 UTC

[jira] [Commented] (STORM-2176) Workers do not shutdown cleanly and worker hooks don't run when a topology is killed

    [ https://issues.apache.org/jira/browse/STORM-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689620#comment-15689620 ] 

Jungtaek Lim commented on STORM-2176:
-------------------------------------

I can reproduce it with Storm 1.1.0 SNAPSHOT: running org.apache.storm.starter.WordCountTopology to remote cluster always reproduces it.

One thiing I found is, log messages might not be written properly while shutdown is being in progress.
I changed the code to leave thread dump just before halting:

{code}
(defn add-shutdown-hook-with-force-kill-in-1-sec
  "adds the user supplied function as a shutdown hook for cleanup.
   Also adds a function that sleeps for a second and then sends kill -9 to process to avoid any zombie process in case
   cleanup function hangs."
  [func]
  (.addShutdownHook (Runtime/getRuntime) (Thread. #(func)))
  (.addShutdownHook (Runtime/getRuntime) (Thread. #((sleep-secs 1)
                                                     (log-message "Shutdown doesn't end in 1 sec, force shutdown")
                                                     (log-message (Utils/threadDump))
                                                     (print "Shutdown doesn't end in 1 sec, force shutdown")
                                                     (print (Utils/threadDump))
                                                    (.halt (Runtime/getRuntime) 20)))))
{code}

With multiple workers I can't get any log messages. After reducing worker into one I can get stack trace log but only from  STDERR (not printed twice), and it seems not print whole stack trace.

{code}
2016-11-23 18:37:59.535 o.a.s.d.worker Thread-59 [INFO] Shutting down worker wordcount-3-1479893791 95a6817f-678d-40a5-ac78-4b6047b4e77f 6700
2016-11-23 18:37:59.535 o.a.s.d.worker Thread-59 [INFO] Terminating messaging context
2016-11-23 18:37:59.535 o.a.s.d.worker Thread-59 [INFO] Shutting down executors
2016-11-23 18:37:59.536 o.a.s.d.executor Thread-59 [INFO] Shutting down executor count:[8 8]
2016-11-23 18:38:00.547 STDERR Thread-2 [INFO] Shutdown doesn't end in 1 sec, force shutdown"SIGTERM handler"
2016-11-23 18:38:00.550 STDERR Thread-2 [INFO]    lock: java.lang.Class@1153f092 owner: Thread-63
2016-11-23 18:38:00.551 STDERR Thread-2 [INFO]    java.lang.Thread.State: BLOCKED
2016-11-23 18:38:00.551 STDERR Thread-2 [INFO]         at java.lang.Shutdown.exit(Shutdown.java:212)
2016-11-23 18:38:00.551 STDERR Thread-2 [INFO]         at java.lang.Terminator$1.handle(Terminator.java:52)
2016-11-23 18:38:00.551 STDERR Thread-2 [INFO]         at sun.misc.Signal$1.run(Signal.java:212)
2016-11-23 18:38:00.551 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.551 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.551 STDERR Thread-2 [INFO] "Thread-60"
2016-11-23 18:38:00.552 STDERR Thread-2 [INFO]    lock: null owner: null
2016-11-23 18:38:00.552 STDERR Thread-2 [INFO]    java.lang.Thread.State: RUNNABLE
2016-11-23 18:38:00.552 STDERR Thread-2 [INFO]         at sun.management.ThreadImpl.getThreadInfo1(Native Method)
2016-11-23 18:38:00.552 STDERR Thread-2 [INFO]         at sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:174)
2016-11-23 18:38:00.552 STDERR Thread-2 [INFO]         at org.apache.storm.utils.Utils.threadDump(Utils.java:1284)
2016-11-23 18:38:00.553 STDERR Thread-2 [INFO]         at org.apache.storm.util$add_shutdown_hook_with_force_kill_in_1_sec$fn__496.invoke(util.clj:466)
2016-11-23 18:38:00.553 STDERR Thread-2 [INFO]         at clojure.lang.AFn.run(AFn.java:22)
2016-11-23 18:38:00.553 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.553 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.553 STDERR Thread-2 [INFO] "process reaper"
2016-11-23 18:38:00.554 STDERR Thread-2 [INFO]    lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
2016-11-23 18:38:00.554 STDERR Thread-2 [INFO]    java.lang.Thread.State: TIMED_WAITING
2016-11-23 18:38:00.554 STDERR Thread-2 [INFO]         at sun.misc.Unsafe.park(Native Method)
2016-11-23 18:38:00.554 STDERR Thread-2 [INFO]         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
2016-11-23 18:38:00.554 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
2016-11-23 18:38:00.555 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
2016-11-23 18:38:00.555 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
2016-11-23 18:38:00.555 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
2016-11-23 18:38:00.555 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
2016-11-23 18:38:00.556 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-11-23 18:38:00.556 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.556 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.556 STDERR Thread-2 [INFO] "Thread-65"
2016-11-23 18:38:00.556 STDERR Thread-2 [INFO]    lock: java.lang.Class@1153f092 owner: Thread-63
2016-11-23 18:38:00.557 STDERR Thread-2 [INFO]    java.lang.Thread.State: BLOCKED
2016-11-23 18:38:00.557 STDERR Thread-2 [INFO]         at java.lang.Shutdown.exit(Shutdown.java:212)
2016-11-23 18:38:00.559 STDERR Thread-2 [INFO]         at java.lang.Runtime.exit(Runtime.java:109)
2016-11-23 18:38:00.559 STDERR Thread-2 [INFO]         at java.lang.System.exit(System.java:971)
2016-11-23 18:38:00.560 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt.die(ShellBolt.java:321)
2016-11-23 18:38:00.560 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt.access$400(ShellBolt.java:70)
2016-11-23 18:38:00.560 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt$BoltWriterRunnable.run(ShellBolt.java:417)
2016-11-23 18:38:00.560 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.560 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.560 STDERR Thread-2 [INFO] "Thread-63"
2016-11-23 18:38:00.560 STDERR Thread-2 [INFO]    lock: java.lang.Thread@7209ffb5 owner: null
2016-11-23 18:38:00.561 STDERR Thread-2 [INFO]    java.lang.Thread.State: WAITING
2016-11-23 18:38:00.561 STDERR Thread-2 [INFO]         at java.lang.Object.wait(Native Method)
2016-11-23 18:38:00.561 STDERR Thread-2 [INFO]         at java.lang.Thread.join(Thread.java:1245)
2016-11-23 18:38:00.561 STDERR Thread-2 [INFO]         at java.lang.Thread.join(Thread.java:1319)
2016-11-23 18:38:00.561 STDERR Thread-2 [INFO]         at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
2016-11-23 18:38:00.561 STDERR Thread-2 [INFO]         at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
2016-11-23 18:38:00.562 STDERR Thread-2 [INFO]         at java.lang.Shutdown.runHooks(Shutdown.java:123)
2016-11-23 18:38:00.562 STDERR Thread-2 [INFO]         at java.lang.Shutdown.sequence(Shutdown.java:167)
2016-11-23 18:38:00.562 STDERR Thread-2 [INFO]         at java.lang.Shutdown.exit(Shutdown.java:212)
2016-11-23 18:38:00.562 STDERR Thread-2 [INFO]         at java.lang.Runtime.exit(Runtime.java:109)
2016-11-23 18:38:00.562 STDERR Thread-2 [INFO]         at java.lang.System.exit(System.java:971)
2016-11-23 18:38:00.562 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt.die(ShellBolt.java:321)
2016-11-23 18:38:00.562 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt.access$400(ShellBolt.java:70)
2016-11-23 18:38:00.563 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:386)
2016-11-23 18:38:00.563 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.563 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.563 STDERR Thread-2 [INFO] "Thread-61"
2016-11-23 18:38:00.563 STDERR Thread-2 [INFO]    lock: java.lang.Class@1153f092 owner: Thread-63
2016-11-23 18:38:00.564 STDERR Thread-2 [INFO]    java.lang.Thread.State: BLOCKED
2016-11-23 18:38:00.564 STDERR Thread-2 [INFO]         at java.lang.Shutdown.exit(Shutdown.java:212)
2016-11-23 18:38:00.564 STDERR Thread-2 [INFO]         at java.lang.Runtime.exit(Runtime.java:109)
2016-11-23 18:38:00.564 STDERR Thread-2 [INFO]         at java.lang.System.exit(System.java:971)
2016-11-23 18:38:00.564 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt.die(ShellBolt.java:321)
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt.access$400(ShellBolt.java:70)
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO]         at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:386)
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO] "process reaper"
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO]    lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO]    java.lang.Thread.State: TIMED_WAITING
2016-11-23 18:38:00.565 STDERR Thread-2 [INFO]         at sun.misc.Unsafe.park(Native Method)
2016-11-23 18:38:00.566 STDERR Thread-2 [INFO]         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
2016-11-23 18:38:00.566 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
2016-11-23 18:38:00.566 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
2016-11-23 18:38:00.566 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
2016-11-23 18:38:00.566 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
2016-11-23 18:38:00.566 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
2016-11-23 18:38:00.566 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-11-23 18:38:00.567 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.567 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.567 STDERR Thread-2 [INFO] "process reaper"
2016-11-23 18:38:00.567 STDERR Thread-2 [INFO]    lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
2016-11-23 18:38:00.567 STDERR Thread-2 [INFO]    java.lang.Thread.State: TIMED_WAITING
2016-11-23 18:38:00.567 STDERR Thread-2 [INFO]         at sun.misc.Unsafe.park(Native Method)
2016-11-23 18:38:00.568 STDERR Thread-2 [INFO]         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
2016-11-23 18:38:00.568 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
2016-11-23 18:38:00.568 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
2016-11-23 18:38:00.568 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
2016-11-23 18:38:00.568 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO] "process reaper"
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO]    lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO]    java.lang.Thread.State: TIMED_WAITING
2016-11-23 18:38:00.569 STDERR Thread-2 [INFO]         at sun.misc.Unsafe.park(Native Method)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-11-23 18:38:00.570 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.571 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.571 STDERR Thread-2 [INFO] "process reaper"
2016-11-23 18:38:00.571 STDERR Thread-2 [INFO]    lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
2016-11-23 18:38:00.571 STDERR Thread-2 [INFO]    java.lang.Thread.State: TIMED_WAITING
2016-11-23 18:38:00.572 STDERR Thread-2 [INFO]         at sun.misc.Unsafe.park(Native Method)
2016-11-23 18:38:00.572 STDERR Thread-2 [INFO]         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
2016-11-23 18:38:00.572 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
2016-11-23 18:38:00.572 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
2016-11-23 18:38:00.572 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
2016-11-23 18:38:00.573 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
2016-11-23 18:38:00.573 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
2016-11-23 18:38:00.573 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-11-23 18:38:00.573 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.573 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.573 STDERR Thread-2 [INFO] "process reaper"
2016-11-23 18:38:00.574 STDERR Thread-2 [INFO]    lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
2016-11-23 18:38:00.574 STDERR Thread-2 [INFO]    java.lang.Thread.State: TIMED_WAITING
2016-11-23 18:38:00.574 STDERR Thread-2 [INFO]         at sun.misc.Unsafe.park(Native Method)
2016-11-23 18:38:00.574 STDERR Thread-2 [INFO]         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
2016-11-23 18:38:00.575 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
2016-11-23 18:38:00.575 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
2016-11-23 18:38:00.575 STDERR Thread-2 [INFO]         at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
2016-11-23 18:38:00.575 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
2016-11-23 18:38:00.575 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
2016-11-23 18:38:00.575 STDERR Thread-2 [INFO]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2016-11-23 18:38:00.575 STDERR Thread-2 [INFO]         at java.lang.Thread.run(Thread.java:745)
2016-11-23 18:38:00.576 STDERR Thread-2 [INFO]
2016-11-23 18:38:00.576 STDERR Thread-2 [INFO] "process reaper"
2016-11-23 18:38:00.582 STDERR Thread-2 [INFO]    lock: java.util.concurrent.SynchronousQueue$Trans
{code}

so as worker.log.err.

{code}
Shutdown doesn't end in 1 sec, force shutdown"SIGTERM handler"
   lock: java.lang.Class@1153f092 owner: Thread-63
   java.lang.Thread.State: BLOCKED
        at java.lang.Shutdown.exit(Shutdown.java:212)
        at java.lang.Terminator$1.handle(Terminator.java:52)
        at sun.misc.Signal$1.run(Signal.java:212)
        at java.lang.Thread.run(Thread.java:745)

"Thread-60"
   lock: null owner: null
   java.lang.Thread.State: RUNNABLE
        at sun.management.ThreadImpl.getThreadInfo1(Native Method)
        at sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:174)
        at org.apache.storm.utils.Utils.threadDump(Utils.java:1284)
        at org.apache.storm.util$add_shutdown_hook_with_force_kill_in_1_sec$fn__496.invoke(util.clj:466)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.lang.Thread.run(Thread.java:745)

"process reaper"
   lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"Thread-65"
   lock: java.lang.Class@1153f092 owner: Thread-63
   java.lang.Thread.State: BLOCKED
        at java.lang.Shutdown.exit(Shutdown.java:212)
        at java.lang.Runtime.exit(Runtime.java:109)
        at java.lang.System.exit(System.java:971)
        at org.apache.storm.task.ShellBolt.die(ShellBolt.java:321)
        at org.apache.storm.task.ShellBolt.access$400(ShellBolt.java:70)
        at org.apache.storm.task.ShellBolt$BoltWriterRunnable.run(ShellBolt.java:417)
        at java.lang.Thread.run(Thread.java:745)

"Thread-63"
   lock: java.lang.Thread@7209ffb5 owner: null
   java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1245)
        at java.lang.Thread.join(Thread.java:1319)
        at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
        at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
        at java.lang.Shutdown.runHooks(Shutdown.java:123)
        at java.lang.Shutdown.sequence(Shutdown.java:167)
        at java.lang.Shutdown.exit(Shutdown.java:212)
        at java.lang.Runtime.exit(Runtime.java:109)
        at java.lang.System.exit(System.java:971)
        at org.apache.storm.task.ShellBolt.die(ShellBolt.java:321)
        at org.apache.storm.task.ShellBolt.access$400(ShellBolt.java:70)
        at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:386)
        at java.lang.Thread.run(Thread.java:745)

"Thread-61"
   lock: java.lang.Class@1153f092 owner: Thread-63
   java.lang.Thread.State: BLOCKED
        at java.lang.Shutdown.exit(Shutdown.java:212)
        at java.lang.Runtime.exit(Runtime.java:109)
        at java.lang.System.exit(System.java:971)
        at org.apache.storm.task.ShellBolt.die(ShellBolt.java:321)
        at org.apache.storm.task.ShellBolt.access$400(ShellBolt.java:70)
        at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:386)
        at java.lang.Thread.run(Thread.java:745)

"process reaper"
   lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"process reaper"
   lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"process reaper"
   lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"process reaper"
   lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"process reaper"
   lock: java.util.concurrent.SynchronousQueue$TransferStack@1f940cc8 owner: null
   java.lang.Thread.State: TIMED_WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
        at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
        at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

"process reaper"
   lock: java.util.concurrent.SynchronousQueue$Trans
{code}

Supervisor log for that worker is below:
{code}
2016-11-23 18:37:59.493 SLOT_6700 o.a.s.d.s.Container [INFO] Killing 95a6817f-678d-40a5-ac78-4b6047b4e77f:0337f37e-f9a2-4278-a855-075425127b32
2016-11-23 18:38:00.601 Thread-57 o.a.s.d.s.BasicContainer [INFO] Worker Process 0337f37e-f9a2-4278-a855-075425127b32 exited with code: 20
{code}

[~ptgoetz]
Is it multilang topology, or normal topology? In multilang topology, worker often get into other issue when killing topology so it might affect shutdown process, but I'm not sure.

{code}
2016-11-23 18:37:59.517 o.a.s.t.ShellBolt Thread-63 [ERROR] Halting process: ShellBolt died. Command: [python, splitsentence.py], ProcessInfo pid:20336, name:split exitCode:143, errorString:
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:


	at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:352) [storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
2016-11-23 18:37:59.521 o.a.s.d.executor Thread-63 [ERROR]
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:


	at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:352) [storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
2016-11-23 18:37:59.529 o.a.s.t.ShellBolt Thread-61 [ERROR] Halting process: ShellBolt died. Command: [python, splitsentence.py], ProcessInfo pid:20337, name:split exitCode:143, errorString:
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:


	at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:352) [storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66]
2016-11-23 18:37:59.530 o.a.s.d.executor Thread-61 [ERROR]
java.lang.RuntimeException: org.apache.storm.multilang.NoOutputException: Pipe to subprocess seems to be broken! No output read.
Serializer Exception:


	at org.apache.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:127) ~[storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
	at org.apache.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:352) [storm-core-1.1.0-SNAPSHOT.jar:1.1.0-SNAPSHOT]
{code}

> Workers do not shutdown cleanly and worker hooks don't run when a topology is killed
> ------------------------------------------------------------------------------------
>
>                 Key: STORM-2176
>                 URL: https://issues.apache.org/jira/browse/STORM-2176
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 1.0.1, 1.0.2
>            Reporter: P. Taylor Goetz
>            Priority: Critical
>
> This appears to have been introduced in the 1.0.0 release. The issues does not seem to affect 0.10.2.
> When a topology is killed and workers receive the notification to shutdown, they do not shutdown cleanly, so worker hooks never get invoked.
> When a worker shuts down cleanly, the worker logs should contain entries such as the following:
> {code}
> 2016-10-28 18:52:06.273 b.s.d.worker [INFO] Shut down transfer thread
> 2016-10-28 18:52:06.279 b.s.d.worker [INFO] Shutting down default resources
> 2016-10-28 18:52:06.287 b.s.d.worker [INFO] Shut down default resources
> 2016-10-28 18:52:06.351 b.s.d.worker [INFO] Disconnecting from storm cluster state context
> 2016-10-28 18:52:06.359 b.s.d.worker [INFO] Shut down worker exclaim-1-1477680593 61bddd66-0fda-4556-b742-4b63f0df6fc1 6700
> {code}
> In the 1.0.x line of releases (and presumably 1.x, though I haven't checked) this does not happen -- the worker shutdown process appears to get stuck shutting down executors (https://github.com/apache/storm/blob/v1.0.2/storm-core/src/clj/org/apache/storm/daemon/worker.clj#L666), no further log messages are seen in the worker log, and worker hooks do not run.
> There are two properties that affect how workers exit. The first is the configuration property {{supervisor.worker.shutdown.sleep.secs}}, which defaults to 1 second. This corresponds to how long the supervisor will wait for a worker to exit gracefully before forcibly killing it with {{kill -9}}. When this happens the supervisor will log that the worker terminated with exit code 137 (128 + 9).
> The second property is a hard-coded 1 second delay (https://github.com/apache/storm/blob/v1.0.2/storm-core/src/clj/org/apache/storm/util.clj#L463) added as a shutdown hook that will call {{Runtime.halt()}} if the delay is exceeded. When this happens, the supervisor will log that the worker terminated with exit code 20 (hard-coded).
> Side Note: The hardcoded halt delay in worker.clj and the default value for {{supervisor.worker.shutdown.sleep.secs}} both being 1 second should probably be changed since it creates a race to see whether the supervisor delay or the worker delay wins.
> To test this, I set {{supervisor.worker.shutdown.sleep.secs}} to 15 to allow plenty of time for the worker to exit gracefully, and deployed and killed a topology. In this case the supervisor consistently reported exit code 20 for the worker, indicating the hard-coded shutdown hook caused the worker to exit.
> I thought the hard-coded 1 second shutdown hook delay might not be long enough for the worker to shutdown cleanly. To test that hypothesis, I changed the hard-code delay to 10 seconds, leaving {{supervisor.worker.shutdown.sleep.secs}} at 15 seconds. Again supervisor reported an exit code of 20 for the worker, and there were no log messages indicating the worker had exited cleanly and that the worker hook had run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)