You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Yun Gao (Jira)" <ji...@apache.org> on 2022/04/13 06:28:04 UTC

[jira] [Updated] (FLINK-25316) BlobServer can get stuck during shutdown

     [ https://issues.apache.org/jira/browse/FLINK-25316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yun Gao updated FLINK-25316:
----------------------------
    Fix Version/s: 1.16.0

> BlobServer can get stuck during shutdown
> ----------------------------------------
>
>                 Key: FLINK-25316
>                 URL: https://issues.apache.org/jira/browse/FLINK-25316
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.0
>            Reporter: Robert Metzger
>            Priority: Minor
>             Fix For: 1.15.0, 1.16.0
>
>
> The cluster shutdown can get stuck
> {code}
> "AkkaRpcService-Supervisor-Termination-Future-Executor-thread-1" #89 daemon prio=5 os_prio=0 tid=0x0000004017d70000 nid=0x2ec in Object.wait() [0x000000402a9b5000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <0x00000000d6c48368> (a org.apache.flink.runtime.blob.BlobServer)
> 	at java.lang.Thread.join(Thread.java:1252)
> 	- locked <0x00000000d6c48368> (a org.apache.flink.runtime.blob.BlobServer)
> 	at java.lang.Thread.join(Thread.java:1326)
> 	at org.apache.flink.runtime.blob.BlobServer.close(BlobServer.java:319)
> 	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.stopClusterServices(ClusterEntrypoint.java:406)
> 	- locked <0x00000000d5d27350> (a java.lang.Object)
> 	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$shutDownAsync$4(ClusterEntrypoint.java:505
> {code}
> because the BlobServer.run() method ignores interrupts:
> {code}
> "BLOB Server listener at 6124" #30 daemon prio=5 os_prio=0 tid=0x000000401c929800 nid=0x2b4 runnable [0x00000040263f9000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.net.PlainSocketImpl.socketAccept(Native Method)
> 	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
> 	at java.net.ServerSocket.implAccept(ServerSocket.java:560)
> 	at java.net.ServerSocket.accept(ServerSocket.java:528)
> 	at org.apache.flink.util.NetUtils.acceptWithoutTimeout(NetUtils.java:143)
> 	at org.apache.flink.runtime.blob.BlobServer.run(BlobServer.java:268)
> {code}
> This issue was introduced in FLINK-24156 and first mentioned in https://issues.apache.org/jira/browse/FLINK-24113?focusedCommentId=17459414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17459414



--
This message was sent by Atlassian Jira
(v8.20.1#820001)