You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Daniel Oliveira <da...@google.com> on 2018/03/09 18:19:52 UTC

Design specs for portable Combine

Hi everyone, I'm going to be working on getting Combines working with
portable pipelines, and I've written up a design for how to model them. If
anyone's interested in portability please check it out and provide any
feedback you may have.

*https://s.apache.org/beam-runner-api-combine-model
<https://s.apache.org/beam-runner-api-combine-model>*

*One part I'm curious for community feedback on is the idea of disabling
Combiner lifting for Combines with side inputs. I mention it here
<https://docs.google.com/document/d/1-3mEs3Y7bIkJ0hmQ6SiHpVIFu5vbY6Zcpw-7tOMVg4U/edit#bookmark=id.ur8f96unbqx8>.
Please let me know if you have objections to that idea.*

*Thank you,*
*Daniel Oliveira*

Re: Design specs for portable Combine

Posted by Daniel Oliveira <da...@google.com>.
So since I made some updates to the doc I feel like this is a good time to
add a summary (I didn't know I needed to do that when I originally sent it
out).

Structure and Lifting of Combines (In Apache Beam Portability)
This doc covers how Combines will be modeled in the Runner API and Fn API,
as well as how the model should be used to perform Combines in different
ways and how this model can be expanded on in the future. Some of the
important points:

   - Combines are modeled by having transforms with CombinePayloads and one
   of several URNs.
   - In the pipeline the Combine is a composite transform with its
   subtransforms describing the implementation.
   - URNs are provided for the Combine Per Key composite transform, and the
   steps Pre-Combine, Merge Accumulators, Extract Output.
   - Non-lifted Combines are implemented as a GroupByKey -> ParDo.
   - Lifted Combines are implemented as Pre-Combine -> GroupByKey -> Merge
   Accumulators -> Extract Output.
   - Side inputs are not described in the model as they can rarely be
   lifted. Combines with side inputs are modeled as GroupByKey -> ParDo.



On Fri, Mar 9, 2018 at 10:19 AM Daniel Oliveira <da...@google.com>
wrote:

> Hi everyone, I'm going to be working on getting Combines working with
> portable pipelines, and I've written up a design for how to model them. If
> anyone's interested in portability please check it out and provide any
> feedback you may have.
>
> *https://s.apache.org/beam-runner-api-combine-model
> <https://s.apache.org/beam-runner-api-combine-model>*
>
> *One part I'm curious for community feedback on is the idea of disabling
> Combiner lifting for Combines with side inputs. I mention it here
> <https://docs.google.com/document/d/1-3mEs3Y7bIkJ0hmQ6SiHpVIFu5vbY6Zcpw-7tOMVg4U/edit#bookmark=id.ur8f96unbqx8>.
> Please let me know if you have objections to that idea.*
>
> *Thank you,*
> *Daniel Oliveira*
>