You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2022/04/22 10:04:00 UTC

[jira] [Created] (FLINK-27354) JobMaster still processes requests while terminating

Matthias Pohl created FLINK-27354:
-------------------------------------

             Summary: JobMaster still processes requests while terminating
                 Key: FLINK-27354
                 URL: https://issues.apache.org/jira/browse/FLINK-27354
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.14.4, 1.13.6, 1.15.0
            Reporter: Matthias Pohl


An issue was reported in the [user ML|https://lists.apache.org/thread/5pm3crntmb1hl17h4txnlhjz34clghrg] about the JobMaster trying to reconnect to the ResourceManager during shutdown.

The JobMaster is disconnecting from the ResourceManager during shutdown (see [JobMaster:1182|https://github.com/apache/flink/blob/da532423487e0534b5fe61f5a02366833f76193a/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L1182]). This triggers the deregistration of the job in the {{ResourceManager}}. The RM responses asynchronously at the end of this deregistration through {{disconnectResourceManager}} (see [ResourceManager:993|https://github.com/apache/flink/blob/da532423487e0534b5fe61f5a02366833f76193a/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L993]) which will trigger a reconnect on the {{JobMaster}}'s side (see [JobMaster::disconnectResourceManager|https://github.com/apache/flink/blob/da532423487e0534b5fe61f5a02366833f76193a/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMaster.java#L789]) if it's still around because the {{resourceManagerAddress}} (used in {{isConnectingToResourceManager}}) is not cleared. This would only happen during a RM leader change.

The {{disconnectResourceManager}} will be ignored if the {{JobMaster}} is gone already.

We should add a guard in some way to {{JobMaster}} to avoid reconnecting to other components during shutdown. This might not only include the ResourceManager connection but might also affect other parts of the {{JobMaster}} API.





--
This message was sent by Atlassian Jira
(v8.20.7#820007)