You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Mason Chen <ma...@gmail.com> on 2022/02/25 20:50:34 UTC

SourceEvent Task Failure Handling

Hi Devs,

I am implementing a custom connector with the FLIP27 interface and I have a
question about how source events are handled with respect to
task/taskmanager failure. In my implementation, the enumerator sends source
events to readers so that readers can react to split changes detected by
the enumerator.

How are source events handled in the cases of task failure/task manager
failure? Are they completely lost or is there an internal mechanism to
re-apply source events upon reader/task/taskmanager restart?

I'm thinking I would need to keep track of processed source events in
enumerator state and resend them if they weren't processed before
reader/task/taskmanager failure. I'd like to avoid the extra work of this
mechanism and it sounds very similar to the addSplitsBack functionality, so
I'm looking for suggestions for what else I might be able to leverage.

Best,
Mason

Re: SourceEvent Task Failure Handling

Posted by Qingsheng Ren <re...@gmail.com>.
Hi Mason, 

Unfortunately from SplitEnumerator’s aspect, events will be lost if they are not delivered to the SourceReader. 

On OperatorCoordinator level, if an OperatorEvent is not delivered successfully, the coordinator will trigger a subtask failover, then OperatorCoordinator#subtaskReset will be invoked so that coordinator could be able to handle it. However, SourceCoordinator only handles splits on subtask reset (via addSplitsBack) so state of split assignment will be consistent, but other custom events will be lost :-( I think a FLIP is needed to resolve this because an interface change on SplitEnumerator is required to handle custom event loss. 

Best regards, 

Qingsheng Ren


> On Feb 26, 2022, at 4:50 AM, Mason Chen <ma...@gmail.com> wrote:
> 
> Hi Devs,
> 
> I am implementing a custom connector with the FLIP27 interface and I have a
> question about how source events are handled with respect to
> task/taskmanager failure. In my implementation, the enumerator sends source
> events to readers so that readers can react to split changes detected by
> the enumerator.
> 
> How are source events handled in the cases of task failure/task manager
> failure? Are they completely lost or is there an internal mechanism to
> re-apply source events upon reader/task/taskmanager restart?
> 
> I'm thinking I would need to keep track of processed source events in
> enumerator state and resend them if they weren't processed before
> reader/task/taskmanager failure. I'd like to avoid the extra work of this
> mechanism and it sounds very similar to the addSplitsBack functionality, so
> I'm looking for suggestions for what else I might be able to leverage.
> 
> Best,
> Mason