You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Eugene Kirpichov (JIRA)" <ji...@apache.org> on 2017/11/10 18:42:00 UTC

[jira] [Comment Edited] (BEAM-3169) WriteFiles data loss with some triggers

    [ https://issues.apache.org/jira/browse/BEAM-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247891#comment-16247891 ] 

Eugene Kirpichov edited comment on BEAM-3169 at 11/10/17 6:41 PM:
------------------------------------------------------------------

I think the proper fix to this is to either make that "GBK onto void key" use a pass-through triggering strategy, or make finalization a simple ParDo, to guarantee that every FileResult will be processed.

Pipelines potentially prone to this are those using WriteFiles where the continuation trigger of the input trigger may potentially drop data. I need to understand continuation triggers better to identify what those are.

For an affected user, a workaround can be to use a trigger that doesn't drop data, e.g. a Repeatedly.forever(something) and a high allowed lateness.


was (Author: jkff):
I think the proper fix to this is to either make that "GBK onto void key" use a pass-through triggering strategy, or make finalization a simple ParDo, to guarantee that every FileResult will be processed.

Pipelines potentially prone to this are those using WriteFiles where the continuation trigger of the input trigger may potentially drop data. I need to understand continuation triggers better to identify what those are.

> WriteFiles data loss with some triggers
> ---------------------------------------
>
>                 Key: BEAM-3169
>                 URL: https://issues.apache.org/jira/browse/BEAM-3169
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>    Affects Versions: 2.0.0, 2.1.0, 2.2.0
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>            Priority: Critical
>             Fix For: 2.2.0
>
>
> https://stackoverflow.com/questions/47113773/dataflow-2-1-0-streaming-application-is-not-cleaning-temp-folders/47142671?noredirect=1#comment81401472_47142671
> Details in comments



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)