You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Till Rohrmann (Jira)" <ji...@apache.org> on 2020/08/07 10:31:00 UTC

[jira] [Closed] (FLINK-16510) Task manager safeguard shutdown may not be reliable

     [ https://issues.apache.org/jira/browse/FLINK-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Till Rohrmann closed FLINK-16510.
---------------------------------
    Resolution: Fixed

Fixed via

master: 182be5145b5b45cfedc65abca5dfe723651aa38e
1.11.2: 6782254aadad50d2879a626eb39efeafc93fae13
1.10.2: 8c134867dfab5c330da9458c3db9a54ed324ccab

> Task manager safeguard shutdown may not be reliable
> ---------------------------------------------------
>
>                 Key: FLINK-16510
>                 URL: https://issues.apache.org/jira/browse/FLINK-16510
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.10.1, 1.12.0, 1.11.1
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.2, 1.12.0, 1.11.2
>
>         Attachments: command.txt, stack2-1.txt, stack3-mixed.txt, stack3.txt
>
>
> The {{JvmShutdownSafeguard}} does not always succeed but can hang when multiple threads attempt to shutdown the JVM. Apparently mixing {{System.exit()}} with ShutdownHooks and forcefully terminating the JVM via {{Runtime.halt()}} does not play together well:
> {noformat}
> "Jvm Terminator" #22 daemon prio=5 os_prio=0 tid=0x00007fb8e82f2800 nid=0x5a96 runnable [0x00007fb35cffb000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.lang.Shutdown.$$YJP$$halt0(Native Method)
> 	at java.lang.Shutdown.halt0(Shutdown.java)
> 	at java.lang.Shutdown.halt(Shutdown.java:139)
> 	- locked <0x000000047ed67638> (a java.lang.Shutdown$Lock)
> 	at java.lang.Runtime.halt(Runtime.java:276)
> 	at org.apache.flink.runtime.util.JvmShutdownSafeguard$DelayedTerminator.run(JvmShutdownSafeguard.java:86)
> 	at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
> 	- None
> "FlinkCompletableFutureDelayScheduler-thread-1" #18154 daemon prio=5 os_prio=0 tid=0x00007fb708a7d000 nid=0x5a8a waiting for monitor entry [0x00007fb289d49000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
> 	at java.lang.Shutdown.halt(Shutdown.java:139)
> 	- waiting to lock <0x000000047ed67638> (a java.lang.Shutdown$Lock)
> 	at java.lang.Shutdown.exit(Shutdown.java:213)
> 	- locked <0x000000047edb7348> (a java.lang.Class for java.lang.Shutdown)
> 	at java.lang.Runtime.exit(Runtime.java:110)
> 	at java.lang.System.exit(System.java:973)
> 	at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.terminateJVM(TaskManagerRunner.java:266)
> 	at org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$onFatalError$1(TaskManagerRunner.java:260)
> 	at org.apache.flink.runtime.taskexecutor.TaskManagerRunner$$Lambda$27464/1464672548.accept(Unknown Source)
> 	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
> 	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
> 	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> 	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> 	at org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:943)
> 	at org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)
> 	at org.apache.flink.runtime.concurrent.FutureUtils.lambda$orTimeout$11(FutureUtils.java:361)
> 	at org.apache.flink.runtime.concurrent.FutureUtils$$Lambda$27435/159015392.run(Unknown Source)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
>    Locked ownable synchronizers:
> 	- <0x00000006d5e56bd0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> {noformat}
> Note that under this condition the JVM should terminate but it still hangs. Sometimes it quits after several minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)