You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Thomas Groh <tg...@google.com.INVALID> on 2017/04/11 20:56:13 UTC

Renaming SideOutput

Hey everyone:

I'd like to rename DoFn.Context#sideOutput to #output (in the Java SDK).

Having two methods, both named output, one which takes the "main output
type" and one that takes a tag to specify the type more clearly
communicates the actual behavior - sideOutput isn't a "special" way to
output, it's the same as output(T), just to a specified PCollection. This
will help pipeline authors understand the actual behavior of outputting to
a tag, and detangle it from "sideInput", which is a special way to receive
input. Giving them the same name means that it's not even strange to call
output and provide the main output type, which is what we want - it's a
more specific way to output, but does not have different restrictions or
capabilities.

This is also a pretty small change within the SDK - it touches about 20
files, and the changes are pretty automatic.

Thanks,

Thomas

Re: Renaming SideOutput

Posted by Thomas Groh <tg...@google.com.INVALID>.
All outputs are logically the same "type" of output. Within the SDKs, the
"main" output is just the one that is output if no output tag is specified,
and the one that matches the output type parameter of the DoFn. However,
because there are multiple methods, including something that is a
reasonable default, I think it's reasonable to distinguish between the two
"methods" of outputting, while still just calling everything an "output".
Having a main output does reduce the amount of code required for a DoFn
that produces only a single output quite significantly, which is why it's
still around.

Within the language-independent representations and most runners, there's
no actual concept of main-vs-side outputs, except to support the default
output(OutputT) method.

On Wed, Apr 12, 2017 at 6:53 PM, Ankur Chauhan <an...@malloc64.com> wrote:

> This question maybe obvious to others but why is there a distinction
> between main output and additional outputs? Why not just have a simple list
> of outputs where the first one is the Main one.
>
> -- AC
>
> Sent from my iPhone
>
> > On Apr 12, 2017, at 18:08, Melissa Pashniak <me...@google.com.INVALID>
> wrote:
> >
> > I agree, I'll create a PR with the doc changes (the rename + text changes
> > to make things more clear). I know of at least 2 places we refer to side
> > outputs (programming guide and the "Design your pipeline" page).
> >
> >
> > On Tue, Apr 11, 2017 at 5:34 PM, Thomas Groh <tg...@google.com.invalid>
> > wrote:
> >
> >> I think that's a good idea. I would call the outputs of a ParDo the
> "Main
> >> Output" and "Additional Outputs" - it seems like an easy way to make it
> >> clear that there's one output that is always expected, and there may be
> >> more.
> >>
> >> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
> >> robertwb@google.com.invalid> wrote:
> >>
> >>> We should do some renaming in Python too. Right now we have
> >>> SideOutputValue which I'd propose naming TaggedOutput or something
> >>> like that.
> >>>
> >>> Should the docs change too?
> >>> https://beam.apache.org/documentation/programming-
> >> guide/#transforms-sideio
> >>>
> >>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles
> <klk@google.com.invalid
> >>>
> >>> wrote:
> >>>> +1 ditto about sideInput and sideOutput not actually being related
> >>>>
> >>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
> >>>> robertwb@google.com.invalid> wrote:
> >>>>
> >>>>> +1, I think this is a lot clearer.
> >>>>>
> >>>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk
> <sisk@google.com.invalid
> >>>
> >>>>> wrote:
> >>>>>> strong +1 for changing the name away from sideOutput - the fact that
> >>>>>> sideInput and sideOutput are not really related was definitely a
> >>> source
> >>>>> of
> >>>>>> confusion for me when learning beam.
> >>>>>>
> >>>>>> S
> >>>>>>
> >>>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh
> >> <tgroh@google.com.invalid
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hey everyone:
> >>>>>>>
> >>>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
> >>> SDK).
> >>>>>>>
> >>>>>>> Having two methods, both named output, one which takes the "main
> >>> output
> >>>>>>> type" and one that takes a tag to specify the type more clearly
> >>>>>>> communicates the actual behavior - sideOutput isn't a "special" way
> >>> to
> >>>>>>> output, it's the same as output(T), just to a specified
> >> PCollection.
> >>>>> This
> >>>>>>> will help pipeline authors understand the actual behavior of
> >>> outputting
> >>>>> to
> >>>>>>> a tag, and detangle it from "sideInput", which is a special way to
> >>>>> receive
> >>>>>>> input. Giving them the same name means that it's not even strange
> >> to
> >>>>> call
> >>>>>>> output and provide the main output type, which is what we want -
> >>> it's a
> >>>>>>> more specific way to output, but does not have different
> >>> restrictions or
> >>>>>>> capabilities.
> >>>>>>>
> >>>>>>> This is also a pretty small change within the SDK - it touches
> >> about
> >>> 20
> >>>>>>> files, and the changes are pretty automatic.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Thomas
> >>>>>>>
> >>>>>
> >>>
> >>
>

Re: Renaming SideOutput

Posted by Ankur Chauhan <an...@malloc64.com>.
This question maybe obvious to others but why is there a distinction between main output and additional outputs? Why not just have a simple list of outputs where the first one is the Main one. 

-- AC 

Sent from my iPhone

> On Apr 12, 2017, at 18:08, Melissa Pashniak <me...@google.com.INVALID> wrote:
> 
> I agree, I'll create a PR with the doc changes (the rename + text changes
> to make things more clear). I know of at least 2 places we refer to side
> outputs (programming guide and the "Design your pipeline" page).
> 
> 
> On Tue, Apr 11, 2017 at 5:34 PM, Thomas Groh <tg...@google.com.invalid>
> wrote:
> 
>> I think that's a good idea. I would call the outputs of a ParDo the "Main
>> Output" and "Additional Outputs" - it seems like an easy way to make it
>> clear that there's one output that is always expected, and there may be
>> more.
>> 
>> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
>> robertwb@google.com.invalid> wrote:
>> 
>>> We should do some renaming in Python too. Right now we have
>>> SideOutputValue which I'd propose naming TaggedOutput or something
>>> like that.
>>> 
>>> Should the docs change too?
>>> https://beam.apache.org/documentation/programming-
>> guide/#transforms-sideio
>>> 
>>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <klk@google.com.invalid
>>> 
>>> wrote:
>>>> +1 ditto about sideInput and sideOutput not actually being related
>>>> 
>>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
>>>> robertwb@google.com.invalid> wrote:
>>>> 
>>>>> +1, I think this is a lot clearer.
>>>>> 
>>>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <sisk@google.com.invalid
>>> 
>>>>> wrote:
>>>>>> strong +1 for changing the name away from sideOutput - the fact that
>>>>>> sideInput and sideOutput are not really related was definitely a
>>> source
>>>>> of
>>>>>> confusion for me when learning beam.
>>>>>> 
>>>>>> S
>>>>>> 
>>>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh
>> <tgroh@google.com.invalid
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hey everyone:
>>>>>>> 
>>>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
>>> SDK).
>>>>>>> 
>>>>>>> Having two methods, both named output, one which takes the "main
>>> output
>>>>>>> type" and one that takes a tag to specify the type more clearly
>>>>>>> communicates the actual behavior - sideOutput isn't a "special" way
>>> to
>>>>>>> output, it's the same as output(T), just to a specified
>> PCollection.
>>>>> This
>>>>>>> will help pipeline authors understand the actual behavior of
>>> outputting
>>>>> to
>>>>>>> a tag, and detangle it from "sideInput", which is a special way to
>>>>> receive
>>>>>>> input. Giving them the same name means that it's not even strange
>> to
>>>>> call
>>>>>>> output and provide the main output type, which is what we want -
>>> it's a
>>>>>>> more specific way to output, but does not have different
>>> restrictions or
>>>>>>> capabilities.
>>>>>>> 
>>>>>>> This is also a pretty small change within the SDK - it touches
>> about
>>> 20
>>>>>>> files, and the changes are pretty automatic.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Thomas
>>>>>>> 
>>>>> 
>>> 
>> 

Re: Renaming SideOutput

Posted by Melissa Pashniak <me...@google.com.INVALID>.
I agree, I'll create a PR with the doc changes (the rename + text changes
to make things more clear). I know of at least 2 places we refer to side
outputs (programming guide and the "Design your pipeline" page).


On Tue, Apr 11, 2017 at 5:34 PM, Thomas Groh <tg...@google.com.invalid>
wrote:

> I think that's a good idea. I would call the outputs of a ParDo the "Main
> Output" and "Additional Outputs" - it seems like an easy way to make it
> clear that there's one output that is always expected, and there may be
> more.
>
> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
> robertwb@google.com.invalid> wrote:
>
> > We should do some renaming in Python too. Right now we have
> > SideOutputValue which I'd propose naming TaggedOutput or something
> > like that.
> >
> > Should the docs change too?
> > https://beam.apache.org/documentation/programming-
> guide/#transforms-sideio
> >
> > On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <klk@google.com.invalid
> >
> > wrote:
> > > +1 ditto about sideInput and sideOutput not actually being related
> > >
> > > On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
> > > robertwb@google.com.invalid> wrote:
> > >
> > >> +1, I think this is a lot clearer.
> > >>
> > >> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <sisk@google.com.invalid
> >
> > >> wrote:
> > >> > strong +1 for changing the name away from sideOutput - the fact that
> > >> > sideInput and sideOutput are not really related was definitely a
> > source
> > >> of
> > >> > confusion for me when learning beam.
> > >> >
> > >> > S
> > >> >
> > >> > On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh
> <tgroh@google.com.invalid
> > >
> > >> > wrote:
> > >> >
> > >> >> Hey everyone:
> > >> >>
> > >> >> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
> > SDK).
> > >> >>
> > >> >> Having two methods, both named output, one which takes the "main
> > output
> > >> >> type" and one that takes a tag to specify the type more clearly
> > >> >> communicates the actual behavior - sideOutput isn't a "special" way
> > to
> > >> >> output, it's the same as output(T), just to a specified
> PCollection.
> > >> This
> > >> >> will help pipeline authors understand the actual behavior of
> > outputting
> > >> to
> > >> >> a tag, and detangle it from "sideInput", which is a special way to
> > >> receive
> > >> >> input. Giving them the same name means that it's not even strange
> to
> > >> call
> > >> >> output and provide the main output type, which is what we want -
> > it's a
> > >> >> more specific way to output, but does not have different
> > restrictions or
> > >> >> capabilities.
> > >> >>
> > >> >> This is also a pretty small change within the SDK - it touches
> about
> > 20
> > >> >> files, and the changes are pretty automatic.
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> Thomas
> > >> >>
> > >>
> >
>

Re: Renaming SideOutput

Posted by Aviem Zur <av...@gmail.com>.
+1

On Wed, Apr 12, 2017 at 6:06 AM JingsongLee <lz...@aliyun.com> wrote:

> strong +1
> best,
> JingsongLee------------------------------------------------------------------From:Tang
> Jijun(上海_技术部_数据平台_唐觊隽) <ta...@yhd.com>Time:2017 Apr 12 (Wed)
> 10:39To:dev@beam.apache.org <de...@beam.apache.org>Subject:答复: Renaming
> SideOutput
> +1 more clearer
>
>
> -----邮件原件-----
> 发件人: Ankur Chauhan [mailto:ankur@malloc64.com]
> 发送时间: 2017年4月12日 10:36
> 收件人: dev@beam.apache.org
> 主题: Re: Renaming SideOutput
>
>
> +1 this is pretty much the topmost things that I found odd when starting with the beam model. It would definitely be more intuitive to have a consistent name.
>
> Sent from my iPhone
>
> > On Apr 11, 2017, at 18:29, Aljoscha Krettek <al...@apache.org> wrote:
> >
> > +1
> >
> >> On Wed, Apr 12, 2017, at 02:34, Thomas Groh wrote:
> >> I think that's a good idea. I would call the outputs of a ParDo the
> >> "Main Output" and "Additional Outputs" - it seems like an easy way to
> >> make it clear that there's one output that is always expected, and
> >> there may be more.
> >>
> >> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
> >> robertwb@google.com.invalid> wrote:
> >>
> >>> We should do some renaming in Python too. Right now we have
> >>> SideOutputValue which I'd propose naming TaggedOutput or something
> >>> like that.
> >>>
> >>> Should the docs change too?
> >>> https://beam.apache.org/documentation/programming-guide/#transforms-
> >>> sideio
> >>>
> >>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles
> >>> <kl...@google.com.invalid>
> >>> wrote:
> >>>> +1 ditto about sideInput and sideOutput not actually being related
> >>>>
> >>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
> >>>> robertwb@google.com.invalid> wrote:
> >>>>
> >>>>> +1, I think this is a lot clearer.
> >>>>>
> >>>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk
> >>>>> <si...@google.com.invalid>
> >>>>> wrote:
> >>>>>> strong +1 for changing the name away from sideOutput - the fact
> >>>>>> that sideInput and sideOutput are not really related was
> >>>>>> definitely a
> >>> source
> >>>>> of
> >>>>>> confusion for me when learning beam.
> >>>>>>
> >>>>>> S
> >>>>>>
> >>>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh
> >>>>>> <tgroh@google.com.invalid
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hey everyone:
> >>>>>>>
> >>>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the
> >>>>>>> Java
> >>> SDK).
> >>>>>>>
> >>>>>>> Having two methods, both named output, one which takes the "main
> >>> output
> >>>>>>> type" and one that takes a tag to specify the type more clearly
> >>>>>>> communicates the actual behavior - sideOutput isn't a "special"
> >>>>>>> way
> >>> to
>
> >>>>>>> output, it's the same as output(T), just to a specified PCollection.
> >>>>> This
> >>>>>>> will help pipeline authors understand the actual behavior of
> >>> outputting
> >>>>> to
> >>>>>>> a tag, and detangle it from "sideInput", which is a special way
> >>>>>>> to
> >>>>> receive
> >>>>>>> input. Giving them the same name means that it's not even
> >>>>>>> strange to
> >>>>> call
> >>>>>>> output and provide the main output type, which is what we want -
> >>> it's a
> >>>>>>> more specific way to output, but does not have different
> >>> restrictions or
> >>>>>>> capabilities.
> >>>>>>>
> >>>>>>> This is also a pretty small change within the SDK - it touches
> >>>>>>> about
> >>> 20
> >>>>>>> files, and the changes are pretty automatic.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Thomas
> >>>>>>>
> >>>>>
> >>>
>

Re: Renaming SideOutput

Posted by JingsongLee <lz...@aliyun.com>.
strong +1
best,
JingsongLee------------------------------------------------------------------From:Tang Jijun(上海_技术部_数据平台_唐觊隽) <ta...@yhd.com>Time:2017 Apr 12 (Wed) 10:39To:dev@beam.apache.org <de...@beam.apache.org>Subject:答复: Renaming SideOutput
+1 more clearer


-----邮件原件-----
发件人: Ankur Chauhan [mailto:ankur@malloc64.com] 
发送时间: 2017年4月12日 10:36
收件人: dev@beam.apache.org
主题: Re: Renaming SideOutput

+1 this is pretty much the topmost things that I found odd when starting with the beam model. It would definitely be more intuitive to have a consistent name. 

Sent from my iPhone

> On Apr 11, 2017, at 18:29, Aljoscha Krettek <al...@apache.org> wrote:
> 
> +1
> 
>> On Wed, Apr 12, 2017, at 02:34, Thomas Groh wrote:
>> I think that's a good idea. I would call the outputs of a ParDo the 
>> "Main Output" and "Additional Outputs" - it seems like an easy way to 
>> make it clear that there's one output that is always expected, and 
>> there may be more.
>> 
>> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw < 
>> robertwb@google.com.invalid> wrote:
>> 
>>> We should do some renaming in Python too. Right now we have 
>>> SideOutputValue which I'd propose naming TaggedOutput or something 
>>> like that.
>>> 
>>> Should the docs change too?
>>> https://beam.apache.org/documentation/programming-guide/#transforms-
>>> sideio
>>> 
>>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles 
>>> <kl...@google.com.invalid>
>>> wrote:
>>>> +1 ditto about sideInput and sideOutput not actually being related
>>>> 
>>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw < 
>>>> robertwb@google.com.invalid> wrote:
>>>> 
>>>>> +1, I think this is a lot clearer.
>>>>> 
>>>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk 
>>>>> <si...@google.com.invalid>
>>>>> wrote:
>>>>>> strong +1 for changing the name away from sideOutput - the fact 
>>>>>> that sideInput and sideOutput are not really related was 
>>>>>> definitely a
>>> source
>>>>> of
>>>>>> confusion for me when learning beam.
>>>>>> 
>>>>>> S
>>>>>> 
>>>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh 
>>>>>> <tgroh@google.com.invalid
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hey everyone:
>>>>>>> 
>>>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the 
>>>>>>> Java
>>> SDK).
>>>>>>> 
>>>>>>> Having two methods, both named output, one which takes the "main
>>> output
>>>>>>> type" and one that takes a tag to specify the type more clearly 
>>>>>>> communicates the actual behavior - sideOutput isn't a "special" 
>>>>>>> way
>>> to
>>>>>>> output, it's the same as output(T), just to a specified PCollection.
>>>>> This
>>>>>>> will help pipeline authors understand the actual behavior of
>>> outputting
>>>>> to
>>>>>>> a tag, and detangle it from "sideInput", which is a special way 
>>>>>>> to
>>>>> receive
>>>>>>> input. Giving them the same name means that it's not even 
>>>>>>> strange to
>>>>> call
>>>>>>> output and provide the main output type, which is what we want -
>>> it's a
>>>>>>> more specific way to output, but does not have different
>>> restrictions or
>>>>>>> capabilities.
>>>>>>> 
>>>>>>> This is also a pretty small change within the SDK - it touches 
>>>>>>> about
>>> 20
>>>>>>> files, and the changes are pretty automatic.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Thomas
>>>>>>> 
>>>>> 
>>> 

答复: Renaming SideOutput

Posted by "Tang Jijun (上海_技术部_数据平台_唐觊隽)" <ta...@yhd.com>.
+1 more clearer


-----邮件原件-----
发件人: Ankur Chauhan [mailto:ankur@malloc64.com] 
发送时间: 2017年4月12日 10:36
收件人: dev@beam.apache.org
主题: Re: Renaming SideOutput

+1 this is pretty much the topmost things that I found odd when starting with the beam model. It would definitely be more intuitive to have a consistent name. 

Sent from my iPhone

> On Apr 11, 2017, at 18:29, Aljoscha Krettek <al...@apache.org> wrote:
> 
> +1
> 
>> On Wed, Apr 12, 2017, at 02:34, Thomas Groh wrote:
>> I think that's a good idea. I would call the outputs of a ParDo the 
>> "Main Output" and "Additional Outputs" - it seems like an easy way to 
>> make it clear that there's one output that is always expected, and 
>> there may be more.
>> 
>> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw < 
>> robertwb@google.com.invalid> wrote:
>> 
>>> We should do some renaming in Python too. Right now we have 
>>> SideOutputValue which I'd propose naming TaggedOutput or something 
>>> like that.
>>> 
>>> Should the docs change too?
>>> https://beam.apache.org/documentation/programming-guide/#transforms-
>>> sideio
>>> 
>>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles 
>>> <kl...@google.com.invalid>
>>> wrote:
>>>> +1 ditto about sideInput and sideOutput not actually being related
>>>> 
>>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw < 
>>>> robertwb@google.com.invalid> wrote:
>>>> 
>>>>> +1, I think this is a lot clearer.
>>>>> 
>>>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk 
>>>>> <si...@google.com.invalid>
>>>>> wrote:
>>>>>> strong +1 for changing the name away from sideOutput - the fact 
>>>>>> that sideInput and sideOutput are not really related was 
>>>>>> definitely a
>>> source
>>>>> of
>>>>>> confusion for me when learning beam.
>>>>>> 
>>>>>> S
>>>>>> 
>>>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh 
>>>>>> <tgroh@google.com.invalid
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hey everyone:
>>>>>>> 
>>>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the 
>>>>>>> Java
>>> SDK).
>>>>>>> 
>>>>>>> Having two methods, both named output, one which takes the "main
>>> output
>>>>>>> type" and one that takes a tag to specify the type more clearly 
>>>>>>> communicates the actual behavior - sideOutput isn't a "special" 
>>>>>>> way
>>> to
>>>>>>> output, it's the same as output(T), just to a specified PCollection.
>>>>> This
>>>>>>> will help pipeline authors understand the actual behavior of
>>> outputting
>>>>> to
>>>>>>> a tag, and detangle it from "sideInput", which is a special way 
>>>>>>> to
>>>>> receive
>>>>>>> input. Giving them the same name means that it's not even 
>>>>>>> strange to
>>>>> call
>>>>>>> output and provide the main output type, which is what we want -
>>> it's a
>>>>>>> more specific way to output, but does not have different
>>> restrictions or
>>>>>>> capabilities.
>>>>>>> 
>>>>>>> This is also a pretty small change within the SDK - it touches 
>>>>>>> about
>>> 20
>>>>>>> files, and the changes are pretty automatic.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Thomas
>>>>>>> 
>>>>> 
>>> 

Re: Renaming SideOutput

Posted by Ankur Chauhan <an...@malloc64.com>.
+1 this is pretty much the topmost things that I found odd when starting with the beam model. It would definitely be more intuitive to have a consistent name. 

Sent from my iPhone

> On Apr 11, 2017, at 18:29, Aljoscha Krettek <al...@apache.org> wrote:
> 
> +1
> 
>> On Wed, Apr 12, 2017, at 02:34, Thomas Groh wrote:
>> I think that's a good idea. I would call the outputs of a ParDo the "Main
>> Output" and "Additional Outputs" - it seems like an easy way to make it
>> clear that there's one output that is always expected, and there may be
>> more.
>> 
>> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
>> robertwb@google.com.invalid> wrote:
>> 
>>> We should do some renaming in Python too. Right now we have
>>> SideOutputValue which I'd propose naming TaggedOutput or something
>>> like that.
>>> 
>>> Should the docs change too?
>>> https://beam.apache.org/documentation/programming-guide/#transforms-sideio
>>> 
>>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <kl...@google.com.invalid>
>>> wrote:
>>>> +1 ditto about sideInput and sideOutput not actually being related
>>>> 
>>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
>>>> robertwb@google.com.invalid> wrote:
>>>> 
>>>>> +1, I think this is a lot clearer.
>>>>> 
>>>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <si...@google.com.invalid>
>>>>> wrote:
>>>>>> strong +1 for changing the name away from sideOutput - the fact that
>>>>>> sideInput and sideOutput are not really related was definitely a
>>> source
>>>>> of
>>>>>> confusion for me when learning beam.
>>>>>> 
>>>>>> S
>>>>>> 
>>>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tgroh@google.com.invalid
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hey everyone:
>>>>>>> 
>>>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
>>> SDK).
>>>>>>> 
>>>>>>> Having two methods, both named output, one which takes the "main
>>> output
>>>>>>> type" and one that takes a tag to specify the type more clearly
>>>>>>> communicates the actual behavior - sideOutput isn't a "special" way
>>> to
>>>>>>> output, it's the same as output(T), just to a specified PCollection.
>>>>> This
>>>>>>> will help pipeline authors understand the actual behavior of
>>> outputting
>>>>> to
>>>>>>> a tag, and detangle it from "sideInput", which is a special way to
>>>>> receive
>>>>>>> input. Giving them the same name means that it's not even strange to
>>>>> call
>>>>>>> output and provide the main output type, which is what we want -
>>> it's a
>>>>>>> more specific way to output, but does not have different
>>> restrictions or
>>>>>>> capabilities.
>>>>>>> 
>>>>>>> This is also a pretty small change within the SDK - it touches about
>>> 20
>>>>>>> files, and the changes are pretty automatic.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Thomas
>>>>>>> 
>>>>> 
>>> 

Re: Renaming SideOutput

Posted by Aljoscha Krettek <al...@apache.org>.
+1

On Wed, Apr 12, 2017, at 02:34, Thomas Groh wrote:
> I think that's a good idea. I would call the outputs of a ParDo the "Main
> Output" and "Additional Outputs" - it seems like an easy way to make it
> clear that there's one output that is always expected, and there may be
> more.
> 
> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
> robertwb@google.com.invalid> wrote:
> 
> > We should do some renaming in Python too. Right now we have
> > SideOutputValue which I'd propose naming TaggedOutput or something
> > like that.
> >
> > Should the docs change too?
> > https://beam.apache.org/documentation/programming-guide/#transforms-sideio
> >
> > On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <kl...@google.com.invalid>
> > wrote:
> > > +1 ditto about sideInput and sideOutput not actually being related
> > >
> > > On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
> > > robertwb@google.com.invalid> wrote:
> > >
> > >> +1, I think this is a lot clearer.
> > >>
> > >> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <si...@google.com.invalid>
> > >> wrote:
> > >> > strong +1 for changing the name away from sideOutput - the fact that
> > >> > sideInput and sideOutput are not really related was definitely a
> > source
> > >> of
> > >> > confusion for me when learning beam.
> > >> >
> > >> > S
> > >> >
> > >> > On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tgroh@google.com.invalid
> > >
> > >> > wrote:
> > >> >
> > >> >> Hey everyone:
> > >> >>
> > >> >> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
> > SDK).
> > >> >>
> > >> >> Having two methods, both named output, one which takes the "main
> > output
> > >> >> type" and one that takes a tag to specify the type more clearly
> > >> >> communicates the actual behavior - sideOutput isn't a "special" way
> > to
> > >> >> output, it's the same as output(T), just to a specified PCollection.
> > >> This
> > >> >> will help pipeline authors understand the actual behavior of
> > outputting
> > >> to
> > >> >> a tag, and detangle it from "sideInput", which is a special way to
> > >> receive
> > >> >> input. Giving them the same name means that it's not even strange to
> > >> call
> > >> >> output and provide the main output type, which is what we want -
> > it's a
> > >> >> more specific way to output, but does not have different
> > restrictions or
> > >> >> capabilities.
> > >> >>
> > >> >> This is also a pretty small change within the SDK - it touches about
> > 20
> > >> >> files, and the changes are pretty automatic.
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> Thomas
> > >> >>
> > >>
> >

Re: Renaming SideOutput

Posted by Thomas Groh <tg...@google.com.INVALID>.
Cool! I've filed https://issues.apache.org/jira/browse/BEAM-1949 and
authored https://github.com/apache/beam/pull/2512 to make this change.

On Tue, Apr 11, 2017 at 11:33 PM, Ted Yu <yu...@gmail.com> wrote:

> +1
>
> > On Apr 11, 2017, at 5:34 PM, Thomas Groh <tg...@google.com.INVALID>
> wrote:
> >
> > I think that's a good idea. I would call the outputs of a ParDo the "Main
> > Output" and "Additional Outputs" - it seems like an easy way to make it
> > clear that there's one output that is always expected, and there may be
> > more.
> >
> > On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
> > robertwb@google.com.invalid> wrote:
> >
> >> We should do some renaming in Python too. Right now we have
> >> SideOutputValue which I'd propose naming TaggedOutput or something
> >> like that.
> >>
> >> Should the docs change too?
> >> https://beam.apache.org/documentation/programming-
> guide/#transforms-sideio
> >>
> >> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <klk@google.com.invalid
> >
> >> wrote:
> >>> +1 ditto about sideInput and sideOutput not actually being related
> >>>
> >>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
> >>> robertwb@google.com.invalid> wrote:
> >>>
> >>>> +1, I think this is a lot clearer.
> >>>>
> >>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <sisk@google.com.invalid
> >
> >>>> wrote:
> >>>>> strong +1 for changing the name away from sideOutput - the fact that
> >>>>> sideInput and sideOutput are not really related was definitely a
> >> source
> >>>> of
> >>>>> confusion for me when learning beam.
> >>>>>
> >>>>> S
> >>>>>
> >>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tgroh@google.com.invalid
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hey everyone:
> >>>>>>
> >>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
> >> SDK).
> >>>>>>
> >>>>>> Having two methods, both named output, one which takes the "main
> >> output
> >>>>>> type" and one that takes a tag to specify the type more clearly
> >>>>>> communicates the actual behavior - sideOutput isn't a "special" way
> >> to
> >>>>>> output, it's the same as output(T), just to a specified PCollection.
> >>>> This
> >>>>>> will help pipeline authors understand the actual behavior of
> >> outputting
> >>>> to
> >>>>>> a tag, and detangle it from "sideInput", which is a special way to
> >>>> receive
> >>>>>> input. Giving them the same name means that it's not even strange to
> >>>> call
> >>>>>> output and provide the main output type, which is what we want -
> >> it's a
> >>>>>> more specific way to output, but does not have different
> >> restrictions or
> >>>>>> capabilities.
> >>>>>>
> >>>>>> This is also a pretty small change within the SDK - it touches about
> >> 20
> >>>>>> files, and the changes are pretty automatic.
> >>>>>>
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Thomas
> >>
>

Re: Renaming SideOutput

Posted by Ted Yu <yu...@gmail.com>.
+1

> On Apr 11, 2017, at 5:34 PM, Thomas Groh <tg...@google.com.INVALID> wrote:
> 
> I think that's a good idea. I would call the outputs of a ParDo the "Main
> Output" and "Additional Outputs" - it seems like an easy way to make it
> clear that there's one output that is always expected, and there may be
> more.
> 
> On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
> robertwb@google.com.invalid> wrote:
> 
>> We should do some renaming in Python too. Right now we have
>> SideOutputValue which I'd propose naming TaggedOutput or something
>> like that.
>> 
>> Should the docs change too?
>> https://beam.apache.org/documentation/programming-guide/#transforms-sideio
>> 
>> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <kl...@google.com.invalid>
>> wrote:
>>> +1 ditto about sideInput and sideOutput not actually being related
>>> 
>>> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
>>> robertwb@google.com.invalid> wrote:
>>> 
>>>> +1, I think this is a lot clearer.
>>>> 
>>>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <si...@google.com.invalid>
>>>> wrote:
>>>>> strong +1 for changing the name away from sideOutput - the fact that
>>>>> sideInput and sideOutput are not really related was definitely a
>> source
>>>> of
>>>>> confusion for me when learning beam.
>>>>> 
>>>>> S
>>>>> 
>>>>> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tgroh@google.com.invalid
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hey everyone:
>>>>>> 
>>>>>> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
>> SDK).
>>>>>> 
>>>>>> Having two methods, both named output, one which takes the "main
>> output
>>>>>> type" and one that takes a tag to specify the type more clearly
>>>>>> communicates the actual behavior - sideOutput isn't a "special" way
>> to
>>>>>> output, it's the same as output(T), just to a specified PCollection.
>>>> This
>>>>>> will help pipeline authors understand the actual behavior of
>> outputting
>>>> to
>>>>>> a tag, and detangle it from "sideInput", which is a special way to
>>>> receive
>>>>>> input. Giving them the same name means that it's not even strange to
>>>> call
>>>>>> output and provide the main output type, which is what we want -
>> it's a
>>>>>> more specific way to output, but does not have different
>> restrictions or
>>>>>> capabilities.
>>>>>> 
>>>>>> This is also a pretty small change within the SDK - it touches about
>> 20
>>>>>> files, and the changes are pretty automatic.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Thomas
>> 

Re: Renaming SideOutput

Posted by Thomas Groh <tg...@google.com.INVALID>.
I think that's a good idea. I would call the outputs of a ParDo the "Main
Output" and "Additional Outputs" - it seems like an easy way to make it
clear that there's one output that is always expected, and there may be
more.

On Tue, Apr 11, 2017 at 5:29 PM, Robert Bradshaw <
robertwb@google.com.invalid> wrote:

> We should do some renaming in Python too. Right now we have
> SideOutputValue which I'd propose naming TaggedOutput or something
> like that.
>
> Should the docs change too?
> https://beam.apache.org/documentation/programming-guide/#transforms-sideio
>
> On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <kl...@google.com.invalid>
> wrote:
> > +1 ditto about sideInput and sideOutput not actually being related
> >
> > On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
> > robertwb@google.com.invalid> wrote:
> >
> >> +1, I think this is a lot clearer.
> >>
> >> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <si...@google.com.invalid>
> >> wrote:
> >> > strong +1 for changing the name away from sideOutput - the fact that
> >> > sideInput and sideOutput are not really related was definitely a
> source
> >> of
> >> > confusion for me when learning beam.
> >> >
> >> > S
> >> >
> >> > On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tgroh@google.com.invalid
> >
> >> > wrote:
> >> >
> >> >> Hey everyone:
> >> >>
> >> >> I'd like to rename DoFn.Context#sideOutput to #output (in the Java
> SDK).
> >> >>
> >> >> Having two methods, both named output, one which takes the "main
> output
> >> >> type" and one that takes a tag to specify the type more clearly
> >> >> communicates the actual behavior - sideOutput isn't a "special" way
> to
> >> >> output, it's the same as output(T), just to a specified PCollection.
> >> This
> >> >> will help pipeline authors understand the actual behavior of
> outputting
> >> to
> >> >> a tag, and detangle it from "sideInput", which is a special way to
> >> receive
> >> >> input. Giving them the same name means that it's not even strange to
> >> call
> >> >> output and provide the main output type, which is what we want -
> it's a
> >> >> more specific way to output, but does not have different
> restrictions or
> >> >> capabilities.
> >> >>
> >> >> This is also a pretty small change within the SDK - it touches about
> 20
> >> >> files, and the changes are pretty automatic.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Thomas
> >> >>
> >>
>

Re: Renaming SideOutput

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
We should do some renaming in Python too. Right now we have
SideOutputValue which I'd propose naming TaggedOutput or something
like that.

Should the docs change too?
https://beam.apache.org/documentation/programming-guide/#transforms-sideio

On Tue, Apr 11, 2017 at 5:25 PM, Kenneth Knowles <kl...@google.com.invalid> wrote:
> +1 ditto about sideInput and sideOutput not actually being related
>
> On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
> robertwb@google.com.invalid> wrote:
>
>> +1, I think this is a lot clearer.
>>
>> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <si...@google.com.invalid>
>> wrote:
>> > strong +1 for changing the name away from sideOutput - the fact that
>> > sideInput and sideOutput are not really related was definitely a source
>> of
>> > confusion for me when learning beam.
>> >
>> > S
>> >
>> > On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tg...@google.com.invalid>
>> > wrote:
>> >
>> >> Hey everyone:
>> >>
>> >> I'd like to rename DoFn.Context#sideOutput to #output (in the Java SDK).
>> >>
>> >> Having two methods, both named output, one which takes the "main output
>> >> type" and one that takes a tag to specify the type more clearly
>> >> communicates the actual behavior - sideOutput isn't a "special" way to
>> >> output, it's the same as output(T), just to a specified PCollection.
>> This
>> >> will help pipeline authors understand the actual behavior of outputting
>> to
>> >> a tag, and detangle it from "sideInput", which is a special way to
>> receive
>> >> input. Giving them the same name means that it's not even strange to
>> call
>> >> output and provide the main output type, which is what we want - it's a
>> >> more specific way to output, but does not have different restrictions or
>> >> capabilities.
>> >>
>> >> This is also a pretty small change within the SDK - it touches about 20
>> >> files, and the changes are pretty automatic.
>> >>
>> >> Thanks,
>> >>
>> >> Thomas
>> >>
>>

Re: Renaming SideOutput

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
+1 ditto about sideInput and sideOutput not actually being related

On Tue, Apr 11, 2017 at 3:52 PM, Robert Bradshaw <
robertwb@google.com.invalid> wrote:

> +1, I think this is a lot clearer.
>
> On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <si...@google.com.invalid>
> wrote:
> > strong +1 for changing the name away from sideOutput - the fact that
> > sideInput and sideOutput are not really related was definitely a source
> of
> > confusion for me when learning beam.
> >
> > S
> >
> > On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tg...@google.com.invalid>
> > wrote:
> >
> >> Hey everyone:
> >>
> >> I'd like to rename DoFn.Context#sideOutput to #output (in the Java SDK).
> >>
> >> Having two methods, both named output, one which takes the "main output
> >> type" and one that takes a tag to specify the type more clearly
> >> communicates the actual behavior - sideOutput isn't a "special" way to
> >> output, it's the same as output(T), just to a specified PCollection.
> This
> >> will help pipeline authors understand the actual behavior of outputting
> to
> >> a tag, and detangle it from "sideInput", which is a special way to
> receive
> >> input. Giving them the same name means that it's not even strange to
> call
> >> output and provide the main output type, which is what we want - it's a
> >> more specific way to output, but does not have different restrictions or
> >> capabilities.
> >>
> >> This is also a pretty small change within the SDK - it touches about 20
> >> files, and the changes are pretty automatic.
> >>
> >> Thanks,
> >>
> >> Thomas
> >>
>

Re: Renaming SideOutput

Posted by Robert Bradshaw <ro...@google.com.INVALID>.
+1, I think this is a lot clearer.

On Tue, Apr 11, 2017 at 2:24 PM, Stephen Sisk <si...@google.com.invalid> wrote:
> strong +1 for changing the name away from sideOutput - the fact that
> sideInput and sideOutput are not really related was definitely a source of
> confusion for me when learning beam.
>
> S
>
> On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tg...@google.com.invalid>
> wrote:
>
>> Hey everyone:
>>
>> I'd like to rename DoFn.Context#sideOutput to #output (in the Java SDK).
>>
>> Having two methods, both named output, one which takes the "main output
>> type" and one that takes a tag to specify the type more clearly
>> communicates the actual behavior - sideOutput isn't a "special" way to
>> output, it's the same as output(T), just to a specified PCollection. This
>> will help pipeline authors understand the actual behavior of outputting to
>> a tag, and detangle it from "sideInput", which is a special way to receive
>> input. Giving them the same name means that it's not even strange to call
>> output and provide the main output type, which is what we want - it's a
>> more specific way to output, but does not have different restrictions or
>> capabilities.
>>
>> This is also a pretty small change within the SDK - it touches about 20
>> files, and the changes are pretty automatic.
>>
>> Thanks,
>>
>> Thomas
>>

Re: Renaming SideOutput

Posted by Stephen Sisk <si...@google.com.INVALID>.
strong +1 for changing the name away from sideOutput - the fact that
sideInput and sideOutput are not really related was definitely a source of
confusion for me when learning beam.

S

On Tue, Apr 11, 2017 at 1:56 PM Thomas Groh <tg...@google.com.invalid>
wrote:

> Hey everyone:
>
> I'd like to rename DoFn.Context#sideOutput to #output (in the Java SDK).
>
> Having two methods, both named output, one which takes the "main output
> type" and one that takes a tag to specify the type more clearly
> communicates the actual behavior - sideOutput isn't a "special" way to
> output, it's the same as output(T), just to a specified PCollection. This
> will help pipeline authors understand the actual behavior of outputting to
> a tag, and detangle it from "sideInput", which is a special way to receive
> input. Giving them the same name means that it's not even strange to call
> output and provide the main output type, which is what we want - it's a
> more specific way to output, but does not have different restrictions or
> capabilities.
>
> This is also a pretty small change within the SDK - it touches about 20
> files, and the changes are pretty automatic.
>
> Thanks,
>
> Thomas
>

RE: Renaming SideOutput

Posted by "刘键(Basti Liu)" <ba...@alibaba-inc.com>.
+1. 
SideInput and SideOutput probably make new user confused. It is different behavior.
BTW, is it also better to change "main output" to "default output" when user does not explicitly specify an output tag?

Regards
Jian Liu(Basti)

-----Original Message-----
From: Thomas Groh [mailto:tgroh@google.com.INVALID] 
Sent: Wednesday, April 12, 2017 4:56 AM
To: dev@beam.apache.org
Subject: Renaming SideOutput

Hey everyone:

I'd like to rename DoFn.Context#sideOutput to #output (in the Java SDK).

Having two methods, both named output, one which takes the "main output type" and one that takes a tag to specify the type more clearly communicates the actual behavior - sideOutput isn't a "special" way to output, it's the same as output(T), just to a specified PCollection. This will help pipeline authors understand the actual behavior of outputting to a tag, and detangle it from "sideInput", which is a special way to receive input. Giving them the same name means that it's not even strange to call output and provide the main output type, which is what we want - it's a more specific way to output, but does not have different restrictions or capabilities.

This is also a pretty small change within the SDK - it touches about 20 files, and the changes are pretty automatic.

Thanks,

Thomas