You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2024/03/06 15:50:00 UTC
[jira] [Created] (FLINK-34589) FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step
Matthias Pohl created FLINK-34589:
-------------------------------------
Summary: FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step
Key: FLINK-34589
URL: https://issues.apache.org/jira/browse/FLINK-34589
Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.18.1, 1.19.0, 1.20.0
Reporter: Matthias Pohl
I noticed during my work on FLINK-34427 that the reconcilliation is scheduled periodically when starting the {{SlotManager}}. But it doesn't handle errors in this step. I see two options here:
1. Fail fatally because such an error might indicate a major issue with the RM backend.
2. Log the failure and continue the scheduled task even in case of an error.
My understanding is that we're just not able to recreate TaskManagers which should be a transient issue and could be resolved in the backend (YARN, k8s). That's why I would lean towards option 2.
[~xtsong] WDYT?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)