You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Qingsheng Ren <re...@gmail.com> on 2022/03/03 07:47:19 UTC

Re: SourceEvent Task Failure Handling

Hi Mason, 

Unfortunately from SplitEnumerator’s aspect, events will be lost if they are not delivered to the SourceReader. 

On OperatorCoordinator level, if an OperatorEvent is not delivered successfully, the coordinator will trigger a subtask failover, then OperatorCoordinator#subtaskReset will be invoked so that coordinator could be able to handle it. However, SourceCoordinator only handles splits on subtask reset (via addSplitsBack) so state of split assignment will be consistent, but other custom events will be lost :-( I think a FLIP is needed to resolve this because an interface change on SplitEnumerator is required to handle custom event loss. 

Best regards, 

Qingsheng Ren


> On Feb 26, 2022, at 4:50 AM, Mason Chen <ma...@gmail.com> wrote:
> 
> Hi Devs,
> 
> I am implementing a custom connector with the FLIP27 interface and I have a
> question about how source events are handled with respect to
> task/taskmanager failure. In my implementation, the enumerator sends source
> events to readers so that readers can react to split changes detected by
> the enumerator.
> 
> How are source events handled in the cases of task failure/task manager
> failure? Are they completely lost or is there an internal mechanism to
> re-apply source events upon reader/task/taskmanager restart?
> 
> I'm thinking I would need to keep track of processed source events in
> enumerator state and resend them if they weren't processed before
> reader/task/taskmanager failure. I'd like to avoid the extra work of this
> mechanism and it sounds very similar to the addSplitsBack functionality, so
> I'm looking for suggestions for what else I might be able to leverage.
> 
> Best,
> Mason