You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/11/03 13:44:00 UTC

[jira] [Work logged] (BEAM-10409) Add combiner packing to graph optimizer phases

     [ https://issues.apache.org/jira/browse/BEAM-10409?focusedWorklogId=506763&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506763 ]

ASF GitHub Bot logged work on BEAM-10409:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Nov/20 13:43
            Start Date: 03/Nov/20 13:43
    Worklog Time Spent: 10m 
      Work Description: yifanmai commented on pull request #13204:
URL: https://github.com/apache/beam/pull/13204#issuecomment-720641086


   R: @robertwb @tvalentyn 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 506763)
    Time Spent: 7h 50m  (was: 7h 40m)

> Add combiner packing to graph optimizer phases
> ----------------------------------------------
>
>                 Key: BEAM-10409
>                 URL: https://issues.apache.org/jira/browse/BEAM-10409
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-core
>            Reporter: Yifan Mai
>            Assignee: Yifan Mai
>            Priority: P2
>          Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Some use cases of Beam (e.g. [TensorFlow Transform|https://github.com/tensorflow/transform]) create thousands of Combine stages with a common parent. The large number of stages can cause performance issues on some runners. To alleviate, a graph optimization phase could be added to the translations module that packs compatible Combine stages into a single stage.
> The graph optimization for CombinePerKey would work as follows: If CombinePerKey stages have a common input, one input each, and one output each, pack the stages into a single stage that runs all CombinePerKeys and outputs resulting tuples to a new PCollection. A subsequent stage unpacks tuples from this PCollection and sends them to the original output PCollections.
> There is an additional issue with supporting this for CombineGlobally: because of the intermediate KeyWithVoid stage between the CombinePerKey stages and the input stage, the CombinePerKey stages do not have a common input stage, and cannot be packed. To support CombineGlobally, a common sibling elimination graph optimization phase can be used to combine the KeyWithVoid stages. After this, the CombinePerKey stages would have a common input and can be packed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)