You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2024/03/07 06:37:00 UTC
[jira] [Updated] (FLINK-34589) FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step
[ https://issues.apache.org/jira/browse/FLINK-34589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Pohl updated FLINK-34589:
----------------------------------
Issue Type: Technical Debt (was: Bug)
Priority: Minor (was: Major)
> FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step
> ---------------------------------------------------------------------------------
>
> Key: FLINK-34589
> URL: https://issues.apache.org/jira/browse/FLINK-34589
> Project: Flink
> Issue Type: Technical Debt
> Components: Runtime / Coordination
> Affects Versions: 1.19.0, 1.18.1, 1.20.0
> Reporter: Matthias Pohl
> Priority: Minor
>
> I noticed during my work on FLINK-34427 that the reconcilliation is scheduled periodically when starting the {{SlotManager}}. But it doesn't handle errors in this step. I see two options here:
> 1. Fail fatally because such an error might indicate a major issue with the RM backend.
> 2. Log the failure and continue the scheduled task even in case of an error.
> My understanding is that we're just not able to recreate TaskManagers which should be a transient issue and could be resolved in the backend (YARN, k8s). That's why I would lean towards option 2.
> [~xtsong] WDYT?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)