You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/07 22:30:00 UTC

[jira] [Work logged] (BEAM-4242) Wait.on() is O(n)

     [ https://issues.apache.org/jira/browse/BEAM-4242?focusedWorklogId=99240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-99240 ]

ASF GitHub Bot logged work on BEAM-4242:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/May/18 22:29
            Start Date: 07/May/18 22:29
    Worklog Time Spent: 10m 
      Work Description: jkff opened a new pull request #5296: [BEAM-4242] Fixes O(n) complexity of Wait.on() due to combiner lifting
URL: https://github.com/apache/beam/pull/5296
 
 
   * Fixes Wait implementation to not depend on combiner lifting happening
   * Fixes the condition for combiner lifting to only account for the trigger in streaming
   
   R: @chamikaramj 
   CC: @mairbek 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 99240)
            Time Spent: 10m
    Remaining Estimate: 0h

> Wait.on() is O(n)
> -----------------
>
>                 Key: BEAM-4242
>                 URL: https://issues.apache.org/jira/browse/BEAM-4242
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Wait.on() uses a NeverTrigger and a Sample.any(1) as an implementation detail.
> Unfortunately, Sample.any() relies on combiner lifting for performance - otherwise all values end up grouped onto the same worker which is not acceptable if the signal PCollection is large.
> Not all runners support combiner lifting at all; and even those that do (e.g. Dataflow) don't guarantee it. In the case of a very large user's pipeline, combiner lifting was not performed because it's only supported for DefaultTrigger, but not for NeverTrigger.
> This should be fixed by modifying Wait to not rely on combiner lifting for performance, e.g. by a "manual" precombine (emit 1 value per bundle).
> CC: [~mkhadikov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)