You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Eugene Kirpichov (JIRA)" <ji...@apache.org> on 2018/05/05 21:00:01 UTC

[jira] [Created] (BEAM-4242) Wait.on() is O(n)

Eugene Kirpichov created BEAM-4242:
--------------------------------------

             Summary: Wait.on() is O(n)
                 Key: BEAM-4242
                 URL: https://issues.apache.org/jira/browse/BEAM-4242
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov


Wait.on() uses a NeverTrigger and a Sample.any(1) as an implementation detail.
Unfortunately, Sample.any() relies on combiner lifting for performance - otherwise all values end up grouped onto the same worker which is not acceptable if the signal PCollection is large.

Not all runners support combiner lifting at all; and even those that do (e.g. Dataflow) don't guarantee it. In the case of a very large user's pipeline, combiner lifting was not performed because it's only supported for DefaultTrigger, but not for NeverTrigger.

This should be fixed by modifying Wait to not rely on combiner lifting for performance, e.g. by a "manual" precombine (emit 1 value per bundle).

CC: [~mkhadikov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)