You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Davor Bonaci (JIRA)" <ji...@apache.org> on 2016/12/31 23:36:58 UTC

[jira] [Commented] (BEAM-1234) Consider a hint ParDo.withHighFanout()

    [ https://issues.apache.org/jira/browse/BEAM-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15790204#comment-15790204 ] 

Davor Bonaci commented on BEAM-1234:
------------------------------------

There are other alternatives too, i.e., checkpoint() -- or a hint vs. a requirement. At this point, I tend to prefer a required checkpoint, but a detailed analysis would be useful.

> Consider a hint ParDo.withHighFanout()
> --------------------------------------
>
>                 Key: BEAM-1234
>                 URL: https://issues.apache.org/jira/browse/BEAM-1234
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Davor Bonaci
>            Priority: Minor
>
> I'm finding myself again and again suggesting users on StackOverflow to insert fusion breaks after high-fanout ParDo's.
> I think we should just implement this as a hint on ParDo and MapElements transforms, like we have on GroupByKey.fewKeys() or Combine.withHotKeyFanout().
> E.g.: c.apply(ParDo.of(some high-fanout DoFn).withHighFanout()), and a runner that implements fusion could decide to insert a runner-specific fusion break. This somewhat sidesteps the issues in https://issues.apache.org/jira/browse/BEAM-730 and https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E because every runner can decide how to do the right thing, or is free to ignore the hint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)