You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Matthias Pohl (Jira)" <ji...@apache.org> on 2024/03/07 06:37:00 UTC

[jira] [Updated] (FLINK-34589) FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step

     [ https://issues.apache.org/jira/browse/FLINK-34589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias Pohl updated FLINK-34589:
----------------------------------
    Issue Type: Technical Debt  (was: Bug)
      Priority: Minor  (was: Major)

> FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step
> ---------------------------------------------------------------------------------
>
>                 Key: FLINK-34589
>                 URL: https://issues.apache.org/jira/browse/FLINK-34589
>             Project: Flink
>          Issue Type: Technical Debt
>          Components: Runtime / Coordination
>    Affects Versions: 1.19.0, 1.18.1, 1.20.0
>            Reporter: Matthias Pohl
>            Priority: Minor
>
> I noticed during my work on FLINK-34427 that the reconcilliation is scheduled periodically when starting the {{SlotManager}}. But it doesn't handle errors in this step. I see two options here:
> 1. Fail fatally because such an error might indicate a major issue with the RM backend.
> 2. Log the failure and continue the scheduled task even in case of an error.
> My understanding is that we're just not able to recreate TaskManagers which should be a transient issue and could be resolved in the backend (YARN, k8s). That's why I would lean towards option 2.
> [~xtsong] WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)