You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "WONG, DAREN" <da...@amazon.co.uk.INVALID> on 2022/08/08 14:18:40 UTC

[jira] (FLINK-24343) Revisit Scheduler and Coordinator Startup Procedure

Dear FLINK community,

I am looking at JIRA FLINK-24343 and I am trying to understand what this sentence means: “The scheduler must be started to the point that it can handle "failGlobal()" calls, because the coordinators might trigger that during their startup when an exception in "start()" occurs.”.

I have been looking at the source code but I am not sure at where in the code can we say that the scheduler is started to the point that it can handle “failGlobal” call. From the code, I see that schedulers that implements BaseScheduler like DefaultScheduler implements “handleGlobalFailure()”, so I thought the scheduler can handle “failGlobal()” from the get-go/when it was created in JobMaster. I am sure I am missing something and there is gap in my understanding. If you have any context on this, could you share some info or guide me towards the right direction please? Thank you very much.


Regards,
Daren