You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/28 13:54:21 UTC

[jira] [Commented] (FLINK-3257) Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs

    [ https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529691#comment-15529691 ] 

ASF GitHub Bot commented on FLINK-3257:
---------------------------------------

Github user senorcarbone commented on the issue:

    https://github.com/apache/flink/pull/1668
  
    Hey! Good to be back :) . Let's fix this properly, as @StephanEwen recommended it now that there is some time.
    We are writing together with @FouadMA a FLIP to address major loop fixes. Namely, termination determination and fault tolerance. The termination implementation is already in a good shape in my opinion and you can find it [here](https://github.com/senorcarbone/flink/pull/2#pullrequestreview-1929918) so you want to take an early look. The description in the FLIP will make clear of how this works in detail. 
    
    The FT update for loops will be rebase on top of the loop termination fix.
    We hope that you will find this good too and btw thanks for your patience :)


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> -------------------------------------------------------------------
>
>                 Key: FLINK-3257
>                 URL: https://issues.apache.org/jira/browse/FLINK-3257
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Paris Carbone
>            Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution graph. An alternative scheme can potentially include records in-transit through the back-edges of a cyclic execution graph (ABS [1]) to achieve the same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start block output and start upstream backup of all records forwarded from the respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource should finalize the snapshot, unblock its output and emit all records in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)