You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2020/09/02 17:08:33 UTC

[jira] [Commented] (BEAM-3353) Prohibit stacked GBKs with accumulating mode

    [ https://issues.apache.org/jira/browse/BEAM-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189426#comment-17189426 ] 

Beam JIRA Bot commented on BEAM-3353:
-------------------------------------

This issue was marked "stale-P2" and has not received a public comment in 14 days. It is now automatically moved to P3. If you are still affected by it, you can comment and move it back to P2.

> Prohibit stacked GBKs with accumulating mode
> --------------------------------------------
>
>                 Key: BEAM-3353
>                 URL: https://issues.apache.org/jira/browse/BEAM-3353
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core, sdk-py-core
>            Reporter: Eugene Kirpichov
>            Priority: P3
>
> The following test https://github.com/apache/beam/pull/4239 demonstrates that stacked GBKs with accumulating mode are unsafe, the same way that stacked GBKs with merging windows are unsafe.
> In particular, in the pipeline: input -> (gbk onto N keys) -> ungroup -> (gbk onto 1 key) -> ungroup, e.g. suppose the first gbk receives "a" and then "b"; it will emit "a" and then "a","b" - then the second gbk will emit "a" and then "a","a","b" which is meaningless. With combine instead of GBK, it leads to double-counting.
> There are cases where accumulation propagated through stacked aggregation can be desirable, but having it propagate by default is definitely the wrong thing to do. Silently changing it to discarding is likely also the wrong thing to do. So, we should reset the windowing strategy and force the user to specify accumulating mode explicitly if they would like to.
> All pipelines using this currently are computing meaningless results, so rejecting them should not be considered a breaking change. However, we should still find out whether there are a lot of such pipelines or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)