You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2024/03/19 08:13:00 UTC
[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

    [ https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828221#comment-17828221 ] 

Matthias Pohl commented on FLINK-34672:
---------------------------------------

IMHO, the both locks are still necessary because we're accessing the state in the {{JobMasterServiceLeadershipRunner}} and the leadership in {{DefaultLeaderElectionService}}. I also verified that this is not something that was introduced in Flink 1.18 with the [FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box] changes. AFAIS, it can also happen in 1.17- (I didn't check the pre-FLINK-24038 code but only looked into {{release-1.17}}).

One solution would be to move the async callback of [JobMasterServiceLeadershipRunner#forwardIfValidLeader|https://github.com/apache/flink/blob/c9fcb0c74b1354f4f0f1b7c7f62191b8cc6b5725/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L547] into the leader-operation executor of the {{DefaultLeaderElectionService}} to force sequential execution of leadership-related operations.

This is only an issue in the {{JobMasterServiceLeadershipRunner}} because we're executing the creation asynchronously in an io thread. The other place where we check within the contender whether leadership is acquired is the {{DefaultDispatcherRunner}}. But we're not doing any async calls there during leadership handling (the {{DefaultDispatcherRunner}} is created directly in the leader-operation executor while handling the leadership acquired event).

> HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-34672
>                 URL: https://issues.apache.org/jira/browse/FLINK-34672
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.18.1
>            Reporter: Chesnay Schepler
>            Priority: Major
>             Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition when leadership is revoked while a JobMaster is being started.
> It appears to be caused by {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} forwarding futures while holding a lock; depending on whether the forwarded future is already complete the next stage may or may not run while holding that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x00007f531f43d000 nid=0x19d waiting for monitor entry  [0x00007f53084fd000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
>         - waiting to lock <0x00000000f1c0e088> (a java.lang.Object)
>         at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x0000000840ddec40.accept(Unknown Source)
>         at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x0000000840dcf840.run(Unknown Source)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
>         - locked <0x00000000f0e3f4d8> (a java.lang.Object)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x0000000840c23040.run(Unknown Source)
>         at java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
>         at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms elapsed=78699.01s tid=0x00007f5321c6e800 nid=0x396 waiting for monitor entry  [0x00007f530567d000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
>         - waiting to lock <0x00000000f0e3f4d8> (a java.lang.Object)
>         at org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
>         at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
>         at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
>         - locked <0x00000000f1c0e088> (a java.lang.Object)
>         at org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x0000000840e1a840.accept(Unknown Source)
>         at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859)
>         at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837)
>         at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.22/CompletableFuture.java:506)
>         at java.util.concurrent.CompletableFuture.complete(java.base@11.0.22/CompletableFuture.java:2079)
>         at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.registerJobMasterServiceFutures(DefaultJobMasterServiceProcess.java:124)
>         at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess.lambda$new$0(DefaultJobMasterServiceProcess.java:114)
>         at org.apache.flink.runtime.jobmaster.DefaultJobMasterServiceProcess$$Lambda$1319/0x0000000840e1a440.accept(Unknown Source)
>         at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859)
>         at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837)
>         at java.util.concurrent.CompletableFuture.postComplete(java.base@11.0.22/CompletableFuture.java:506)
>         at java.util.concurrent.CompletableFuture$AsyncSupply.run(java.base@11.0.22/CompletableFuture.java:1705)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
>         at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)