You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Aljoscha Krettek <al...@apache.org> on 2015/05/08 11:38:17 UTC

[DISCUSS] Naming and Functionality of Stream Operators and Tasks

Hi,
since I'm currently reworking the Stream operators I thought it's a
good time to talk about the naming of some classes. We have some
legacy problems with lots of Operators, OperatorBases, TwoInput,
OneInput, Unary, Binary, etc. And maybe we can break things in
streaming to have more consistent and future-proof naming.

In streaming, there are:
- Tasks, these are an AbstractInvokabe and contain the main loop of a
streaming vertex. They read from the inputs and forward data to the
operator implementation.

- Operators, these are invoked by a Task and are responsible for the
actual logic of the operator. Think Map, Join, Reduce and so on. These
are responsible for calling the user-defined function.

- Operators (again, I know), these are user facing classes (some
derived from DataStream, some not). There is for example
SingleOutputStreamOperator, for the result of a DataStream
transformation that has a single output. There are also
TemporalOperator and its derived classes StreamCrossOperator and
StreamJoinOperator. The actual operator inside a task (the ones I
mentioned before that are responsible for the user logic) that
executes a temporal join is called CoStreamWindow (with a
JoinWindowFunction).

As I currently have it in my PR, there are two Task classes, one for
single input, and one for two-input operators. There are also the
corresponding operator interfaces for unary and binary operators (see
what I did there ... :D).

What should we call all these classes (concepts). Also I'm heavily in
favour of dropping all the Stream (or Streaming) prefixes and suffixes
from the class names. I know I'm in streaming because the package is
named streaming. And we should not restrain ourselves because the
batch API also has things called operator.

Also, the concept of one-input, two-input tasks and operators is not
very scalable, Maybe we should have a single interface for operators
that has a receiveElement(int, element) method that tells the operator
from which input an element came. Then we can scale this to n-ary
operators. This would of course have the overhead of always sending
along the number of the input instead of encoding the input number in
the method name, such as receiveElement1() and receiveElement2().

Any thoughts? :D (I know I'm writing the long annoying emails today
but I think it is important we discuss these things before being stuck
with them.)

Cheers,
Aljoscha

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by Aljoscha Krettek <al...@apache.org>.
Every vote counts. :D

On Tue, May 12, 2015 at 11:04 AM, Matthias J. Sax
<mj...@informatik.hu-berlin.de> wrote:
> I like it. Not sure if my vote counts ;)
>
> On 05/12/2015 07:18 AM, Aljoscha Krettek wrote:
>>  My proposal for the runtime classes (per my Pull Request is this):
>>
>> StreamTask: base of streaming tasks, the task is the AbstractInvokable
>> that runs in the TaskManager and invokes stream operators
>> OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the
>> subclasses responsible for actual types of operations.
>>
>> StreamOperator: interface for StreamOperators such as Map, Reduce and so on
>> OneInputOperator and TwoInputStreamOperator are the interface for
>> operators with one input and two inputs respectively.
>>
>> There are also AbstractStreamOperator, which provides basic
>> implementations for methods such as setup()/open()/close() and
>> AbstractUdfStreamOperator, which is derived from
>> AbstractStreamOperator. This is for operators that have user-code, it
>> deals with calling the correct functions of RichUserFunctionS
>> (open()/close()/setRuntimeContext()).
>>
>> I realised that we should probably not rename all the actual operators
>> and remove the Stream prefix and suffix, that would be to big a change
>> and orthogonal to my current PR. Other people can do it if they want.
>>
>> These are just my suggestions. Please suggest other consistent naming
>> schemes if think mine to be bad.
>>
>> On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <se...@apache.org> wrote:
>>> How about separating the discussions about runtime class renaming (there
>>> seems to be consensus) from the
>>> API class renaming (no consensus yet).
>>>
>>> To go ahead with the runtime classes, can you make a concrete suggestion
>>> for more memorable/describing names?
>>>
>>> For the  API classes, kick off a thread, if you want, but please clearly
>>> mark in your discussion that this is about an API breaking change
>>> to a user-facing API (that is still declared beta).
>>>
>>>
>>> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <al...@apache.org>
>>> wrote:
>>>
>>>> Come to think of it, why do we even need SingleOutputStreamOperator?
>>>> It is just a subclass of DataStream that has almost no functionality
>>>> that couldn't be implemented in DataStream. I think it makes people
>>>> wonder why the result of a transformation is not a DataStream but this
>>>> mouthful of a class.
>>>>
>>>> And, I light of other possibilities such as MapDriver and PactDriver I
>>>> am quite happy with calling the things StreamOperator and StreamMap.
>>>> :D
>>>>
>>>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <ba...@gmail.com>
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I am in favor of removing the Stream (or Streaming) suffixes and
>>>> prefixes.
>>>>> I think that Gyula was also referring to those.
>>>>>
>>>>> I think the naming of the Tasks, and user facing operators
>>>>> (SingleOutputStreamOperator and alike) are fine.
>>>>>
>>>>> As for the other bunch of Operators we could name them Drivers to be
>>>> mostly
>>>>> in line with the batch naming. By the way, most of the classes do not
>>>> have
>>>>> "Operator" in their name currently - e.g. the one encapsulating the map
>>>>> functionality is called StreamMap, however the base classes
>>>> (StreamOperator
>>>>> and ChainableStreamOperator) have it in their name explicitly. I could go
>>>>> with MapDriver instead of StreamMap, ChainableStreamOperator will be
>>>>> eliminated anyway - StreamOperator needs a new name then: worst case
>>>>> scenario PactDriver. :)
>>>>>
>>>>> As for n-ary operators I agree with Gyula.
>>>>>
>>>>> On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <al...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Which name changes are you referring to? The proposed names in my
>>>>>> recent PR? Or the dropping of Stream from all the classes. For the
>>>>>> rest I was just rambling about how I don't like the names in the batch
>>>>>> API. :D
>>>>>>
>>>>>> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gy...@gmail.com>
>>>> wrote:
>>>>>>> Generally I am in favor of making these name changes. My only concern
>>>> is
>>>>>>> regarding to the one-input and multiple inputs operators.
>>>>>>>
>>>>>>> There is a general problem with the n-ary operators regarding type
>>>>>> safety,
>>>>>>> thats why we now have SingleInput and Co (two-input) operators. I
>>>> think
>>>>>> we
>>>>>>> should keep these.
>>>>>>>
>>>>>>> On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <
>>>> aljoscha@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> since I'm currently reworking the Stream operators I thought it's a
>>>>>>>> good time to talk about the naming of some classes. We have some
>>>>>>>> legacy problems with lots of Operators, OperatorBases, TwoInput,
>>>>>>>> OneInput, Unary, Binary, etc. And maybe we can break things in
>>>>>>>> streaming to have more consistent and future-proof naming.
>>>>>>>>
>>>>>>>> In streaming, there are:
>>>>>>>> - Tasks, these are an AbstractInvokabe and contain the main loop of a
>>>>>>>> streaming vertex. They read from the inputs and forward data to the
>>>>>>>> operator implementation.
>>>>>>>>
>>>>>>>> - Operators, these are invoked by a Task and are responsible for the
>>>>>>>> actual logic of the operator. Think Map, Join, Reduce and so on.
>>>> These
>>>>>>>> are responsible for calling the user-defined function.
>>>>>>>>
>>>>>>>> - Operators (again, I know), these are user facing classes (some
>>>>>>>> derived from DataStream, some not). There is for example
>>>>>>>> SingleOutputStreamOperator, for the result of a DataStream
>>>>>>>> transformation that has a single output. There are also
>>>>>>>> TemporalOperator and its derived classes StreamCrossOperator and
>>>>>>>> StreamJoinOperator. The actual operator inside a task (the ones I
>>>>>>>> mentioned before that are responsible for the user logic) that
>>>>>>>> executes a temporal join is called CoStreamWindow (with a
>>>>>>>> JoinWindowFunction).
>>>>>>>>
>>>>>>>> As I currently have it in my PR, there are two Task classes, one for
>>>>>>>> single input, and one for two-input operators. There are also the
>>>>>>>> corresponding operator interfaces for unary and binary operators (see
>>>>>>>> what I did there ... :D).
>>>>>>>>
>>>>>>>> What should we call all these classes (concepts). Also I'm heavily in
>>>>>>>> favour of dropping all the Stream (or Streaming) prefixes and
>>>> suffixes
>>>>>>>> from the class names. I know I'm in streaming because the package is
>>>>>>>> named streaming. And we should not restrain ourselves because the
>>>>>>>> batch API also has things called operator.
>>>>>>>>
>>>>>>>> Also, the concept of one-input, two-input tasks and operators is not
>>>>>>>> very scalable, Maybe we should have a single interface for operators
>>>>>>>> that has a receiveElement(int, element) method that tells the
>>>> operator
>>>>>>>> from which input an element came. Then we can scale this to n-ary
>>>>>>>> operators. This would of course have the overhead of always sending
>>>>>>>> along the number of the input instead of encoding the input number in
>>>>>>>> the method name, such as receiveElement1() and receiveElement2().
>>>>>>>>
>>>>>>>> Any thoughts? :D (I know I'm writing the long annoying emails today
>>>>>>>> but I think it is important we discuss these things before being
>>>> stuck
>>>>>>>> with them.)
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Aljoscha
>>>>>>>>
>>>>>>
>>>>
>>
>

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by "Matthias J. Sax" <mj...@informatik.hu-berlin.de>.
I like it. Not sure if my vote counts ;)

On 05/12/2015 07:18 AM, Aljoscha Krettek wrote:
>  My proposal for the runtime classes (per my Pull Request is this):
> 
> StreamTask: base of streaming tasks, the task is the AbstractInvokable
> that runs in the TaskManager and invokes stream operators
> OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the
> subclasses responsible for actual types of operations.
> 
> StreamOperator: interface for StreamOperators such as Map, Reduce and so on
> OneInputOperator and TwoInputStreamOperator are the interface for
> operators with one input and two inputs respectively.
> 
> There are also AbstractStreamOperator, which provides basic
> implementations for methods such as setup()/open()/close() and
> AbstractUdfStreamOperator, which is derived from
> AbstractStreamOperator. This is for operators that have user-code, it
> deals with calling the correct functions of RichUserFunctionS
> (open()/close()/setRuntimeContext()).
> 
> I realised that we should probably not rename all the actual operators
> and remove the Stream prefix and suffix, that would be to big a change
> and orthogonal to my current PR. Other people can do it if they want.
> 
> These are just my suggestions. Please suggest other consistent naming
> schemes if think mine to be bad.
> 
> On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <se...@apache.org> wrote:
>> How about separating the discussions about runtime class renaming (there
>> seems to be consensus) from the
>> API class renaming (no consensus yet).
>>
>> To go ahead with the runtime classes, can you make a concrete suggestion
>> for more memorable/describing names?
>>
>> For the  API classes, kick off a thread, if you want, but please clearly
>> mark in your discussion that this is about an API breaking change
>> to a user-facing API (that is still declared beta).
>>
>>
>> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <al...@apache.org>
>> wrote:
>>
>>> Come to think of it, why do we even need SingleOutputStreamOperator?
>>> It is just a subclass of DataStream that has almost no functionality
>>> that couldn't be implemented in DataStream. I think it makes people
>>> wonder why the result of a transformation is not a DataStream but this
>>> mouthful of a class.
>>>
>>> And, I light of other possibilities such as MapDriver and PactDriver I
>>> am quite happy with calling the things StreamOperator and StreamMap.
>>> :D
>>>
>>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <ba...@gmail.com>
>>> wrote:
>>>> Hi,
>>>>
>>>> I am in favor of removing the Stream (or Streaming) suffixes and
>>> prefixes.
>>>> I think that Gyula was also referring to those.
>>>>
>>>> I think the naming of the Tasks, and user facing operators
>>>> (SingleOutputStreamOperator and alike) are fine.
>>>>
>>>> As for the other bunch of Operators we could name them Drivers to be
>>> mostly
>>>> in line with the batch naming. By the way, most of the classes do not
>>> have
>>>> "Operator" in their name currently - e.g. the one encapsulating the map
>>>> functionality is called StreamMap, however the base classes
>>> (StreamOperator
>>>> and ChainableStreamOperator) have it in their name explicitly. I could go
>>>> with MapDriver instead of StreamMap, ChainableStreamOperator will be
>>>> eliminated anyway - StreamOperator needs a new name then: worst case
>>>> scenario PactDriver. :)
>>>>
>>>> As for n-ary operators I agree with Gyula.
>>>>
>>>> On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <al...@apache.org>
>>>> wrote:
>>>>
>>>>> Which name changes are you referring to? The proposed names in my
>>>>> recent PR? Or the dropping of Stream from all the classes. For the
>>>>> rest I was just rambling about how I don't like the names in the batch
>>>>> API. :D
>>>>>
>>>>> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gy...@gmail.com>
>>> wrote:
>>>>>> Generally I am in favor of making these name changes. My only concern
>>> is
>>>>>> regarding to the one-input and multiple inputs operators.
>>>>>>
>>>>>> There is a general problem with the n-ary operators regarding type
>>>>> safety,
>>>>>> thats why we now have SingleInput and Co (two-input) operators. I
>>> think
>>>>> we
>>>>>> should keep these.
>>>>>>
>>>>>> On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <
>>> aljoscha@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> since I'm currently reworking the Stream operators I thought it's a
>>>>>>> good time to talk about the naming of some classes. We have some
>>>>>>> legacy problems with lots of Operators, OperatorBases, TwoInput,
>>>>>>> OneInput, Unary, Binary, etc. And maybe we can break things in
>>>>>>> streaming to have more consistent and future-proof naming.
>>>>>>>
>>>>>>> In streaming, there are:
>>>>>>> - Tasks, these are an AbstractInvokabe and contain the main loop of a
>>>>>>> streaming vertex. They read from the inputs and forward data to the
>>>>>>> operator implementation.
>>>>>>>
>>>>>>> - Operators, these are invoked by a Task and are responsible for the
>>>>>>> actual logic of the operator. Think Map, Join, Reduce and so on.
>>> These
>>>>>>> are responsible for calling the user-defined function.
>>>>>>>
>>>>>>> - Operators (again, I know), these are user facing classes (some
>>>>>>> derived from DataStream, some not). There is for example
>>>>>>> SingleOutputStreamOperator, for the result of a DataStream
>>>>>>> transformation that has a single output. There are also
>>>>>>> TemporalOperator and its derived classes StreamCrossOperator and
>>>>>>> StreamJoinOperator. The actual operator inside a task (the ones I
>>>>>>> mentioned before that are responsible for the user logic) that
>>>>>>> executes a temporal join is called CoStreamWindow (with a
>>>>>>> JoinWindowFunction).
>>>>>>>
>>>>>>> As I currently have it in my PR, there are two Task classes, one for
>>>>>>> single input, and one for two-input operators. There are also the
>>>>>>> corresponding operator interfaces for unary and binary operators (see
>>>>>>> what I did there ... :D).
>>>>>>>
>>>>>>> What should we call all these classes (concepts). Also I'm heavily in
>>>>>>> favour of dropping all the Stream (or Streaming) prefixes and
>>> suffixes
>>>>>>> from the class names. I know I'm in streaming because the package is
>>>>>>> named streaming. And we should not restrain ourselves because the
>>>>>>> batch API also has things called operator.
>>>>>>>
>>>>>>> Also, the concept of one-input, two-input tasks and operators is not
>>>>>>> very scalable, Maybe we should have a single interface for operators
>>>>>>> that has a receiveElement(int, element) method that tells the
>>> operator
>>>>>>> from which input an element came. Then we can scale this to n-ary
>>>>>>> operators. This would of course have the overhead of always sending
>>>>>>> along the number of the input instead of encoding the input number in
>>>>>>> the method name, such as receiveElement1() and receiveElement2().
>>>>>>>
>>>>>>> Any thoughts? :D (I know I'm writing the long annoying emails today
>>>>>>> but I think it is important we discuss these things before being
>>> stuck
>>>>>>> with them.)
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Aljoscha
>>>>>>>
>>>>>
>>>
> 


Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by Aljoscha Krettek <al...@apache.org>.
 My proposal for the runtime classes (per my Pull Request is this):

StreamTask: base of streaming tasks, the task is the AbstractInvokable
that runs in the TaskManager and invokes stream operators
OneInputStreamTask and TwoOnputStreamTask and SourceStreamTask are the
subclasses responsible for actual types of operations.

StreamOperator: interface for StreamOperators such as Map, Reduce and so on
OneInputOperator and TwoInputStreamOperator are the interface for
operators with one input and two inputs respectively.

There are also AbstractStreamOperator, which provides basic
implementations for methods such as setup()/open()/close() and
AbstractUdfStreamOperator, which is derived from
AbstractStreamOperator. This is for operators that have user-code, it
deals with calling the correct functions of RichUserFunctionS
(open()/close()/setRuntimeContext()).

I realised that we should probably not rename all the actual operators
and remove the Stream prefix and suffix, that would be to big a change
and orthogonal to my current PR. Other people can do it if they want.

These are just my suggestions. Please suggest other consistent naming
schemes if think mine to be bad.

On Mon, May 11, 2015 at 9:40 PM, Stephan Ewen <se...@apache.org> wrote:
> How about separating the discussions about runtime class renaming (there
> seems to be consensus) from the
> API class renaming (no consensus yet).
>
> To go ahead with the runtime classes, can you make a concrete suggestion
> for more memorable/describing names?
>
> For the  API classes, kick off a thread, if you want, but please clearly
> mark in your discussion that this is about an API breaking change
> to a user-facing API (that is still declared beta).
>
>
> On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> Come to think of it, why do we even need SingleOutputStreamOperator?
>> It is just a subclass of DataStream that has almost no functionality
>> that couldn't be implemented in DataStream. I think it makes people
>> wonder why the result of a transformation is not a DataStream but this
>> mouthful of a class.
>>
>> And, I light of other possibilities such as MapDriver and PactDriver I
>> am quite happy with calling the things StreamOperator and StreamMap.
>> :D
>>
>> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <ba...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I am in favor of removing the Stream (or Streaming) suffixes and
>> prefixes.
>> > I think that Gyula was also referring to those.
>> >
>> > I think the naming of the Tasks, and user facing operators
>> > (SingleOutputStreamOperator and alike) are fine.
>> >
>> > As for the other bunch of Operators we could name them Drivers to be
>> mostly
>> > in line with the batch naming. By the way, most of the classes do not
>> have
>> > "Operator" in their name currently - e.g. the one encapsulating the map
>> > functionality is called StreamMap, however the base classes
>> (StreamOperator
>> > and ChainableStreamOperator) have it in their name explicitly. I could go
>> > with MapDriver instead of StreamMap, ChainableStreamOperator will be
>> > eliminated anyway - StreamOperator needs a new name then: worst case
>> > scenario PactDriver. :)
>> >
>> > As for n-ary operators I agree with Gyula.
>> >
>> > On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <al...@apache.org>
>> > wrote:
>> >
>> >> Which name changes are you referring to? The proposed names in my
>> >> recent PR? Or the dropping of Stream from all the classes. For the
>> >> rest I was just rambling about how I don't like the names in the batch
>> >> API. :D
>> >>
>> >> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gy...@gmail.com>
>> wrote:
>> >> > Generally I am in favor of making these name changes. My only concern
>> is
>> >> > regarding to the one-input and multiple inputs operators.
>> >> >
>> >> > There is a general problem with the n-ary operators regarding type
>> >> safety,
>> >> > thats why we now have SingleInput and Co (two-input) operators. I
>> think
>> >> we
>> >> > should keep these.
>> >> >
>> >> > On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <
>> aljoscha@apache.org>
>> >> > wrote:
>> >> >
>> >> >> Hi,
>> >> >> since I'm currently reworking the Stream operators I thought it's a
>> >> >> good time to talk about the naming of some classes. We have some
>> >> >> legacy problems with lots of Operators, OperatorBases, TwoInput,
>> >> >> OneInput, Unary, Binary, etc. And maybe we can break things in
>> >> >> streaming to have more consistent and future-proof naming.
>> >> >>
>> >> >> In streaming, there are:
>> >> >> - Tasks, these are an AbstractInvokabe and contain the main loop of a
>> >> >> streaming vertex. They read from the inputs and forward data to the
>> >> >> operator implementation.
>> >> >>
>> >> >> - Operators, these are invoked by a Task and are responsible for the
>> >> >> actual logic of the operator. Think Map, Join, Reduce and so on.
>> These
>> >> >> are responsible for calling the user-defined function.
>> >> >>
>> >> >> - Operators (again, I know), these are user facing classes (some
>> >> >> derived from DataStream, some not). There is for example
>> >> >> SingleOutputStreamOperator, for the result of a DataStream
>> >> >> transformation that has a single output. There are also
>> >> >> TemporalOperator and its derived classes StreamCrossOperator and
>> >> >> StreamJoinOperator. The actual operator inside a task (the ones I
>> >> >> mentioned before that are responsible for the user logic) that
>> >> >> executes a temporal join is called CoStreamWindow (with a
>> >> >> JoinWindowFunction).
>> >> >>
>> >> >> As I currently have it in my PR, there are two Task classes, one for
>> >> >> single input, and one for two-input operators. There are also the
>> >> >> corresponding operator interfaces for unary and binary operators (see
>> >> >> what I did there ... :D).
>> >> >>
>> >> >> What should we call all these classes (concepts). Also I'm heavily in
>> >> >> favour of dropping all the Stream (or Streaming) prefixes and
>> suffixes
>> >> >> from the class names. I know I'm in streaming because the package is
>> >> >> named streaming. And we should not restrain ourselves because the
>> >> >> batch API also has things called operator.
>> >> >>
>> >> >> Also, the concept of one-input, two-input tasks and operators is not
>> >> >> very scalable, Maybe we should have a single interface for operators
>> >> >> that has a receiveElement(int, element) method that tells the
>> operator
>> >> >> from which input an element came. Then we can scale this to n-ary
>> >> >> operators. This would of course have the overhead of always sending
>> >> >> along the number of the input instead of encoding the input number in
>> >> >> the method name, such as receiveElement1() and receiveElement2().
>> >> >>
>> >> >> Any thoughts? :D (I know I'm writing the long annoying emails today
>> >> >> but I think it is important we discuss these things before being
>> stuck
>> >> >> with them.)
>> >> >>
>> >> >> Cheers,
>> >> >> Aljoscha
>> >> >>
>> >>
>>

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by Stephan Ewen <se...@apache.org>.
How about separating the discussions about runtime class renaming (there
seems to be consensus) from the
API class renaming (no consensus yet).

To go ahead with the runtime classes, can you make a concrete suggestion
for more memorable/describing names?

For the  API classes, kick off a thread, if you want, but please clearly
mark in your discussion that this is about an API breaking change
to a user-facing API (that is still declared beta).


On Mon, May 11, 2015 at 10:18 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> Come to think of it, why do we even need SingleOutputStreamOperator?
> It is just a subclass of DataStream that has almost no functionality
> that couldn't be implemented in DataStream. I think it makes people
> wonder why the result of a transformation is not a DataStream but this
> mouthful of a class.
>
> And, I light of other possibilities such as MapDriver and PactDriver I
> am quite happy with calling the things StreamOperator and StreamMap.
> :D
>
> On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <ba...@gmail.com>
> wrote:
> > Hi,
> >
> > I am in favor of removing the Stream (or Streaming) suffixes and
> prefixes.
> > I think that Gyula was also referring to those.
> >
> > I think the naming of the Tasks, and user facing operators
> > (SingleOutputStreamOperator and alike) are fine.
> >
> > As for the other bunch of Operators we could name them Drivers to be
> mostly
> > in line with the batch naming. By the way, most of the classes do not
> have
> > "Operator" in their name currently - e.g. the one encapsulating the map
> > functionality is called StreamMap, however the base classes
> (StreamOperator
> > and ChainableStreamOperator) have it in their name explicitly. I could go
> > with MapDriver instead of StreamMap, ChainableStreamOperator will be
> > eliminated anyway - StreamOperator needs a new name then: worst case
> > scenario PactDriver. :)
> >
> > As for n-ary operators I agree with Gyula.
> >
> > On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> >> Which name changes are you referring to? The proposed names in my
> >> recent PR? Or the dropping of Stream from all the classes. For the
> >> rest I was just rambling about how I don't like the names in the batch
> >> API. :D
> >>
> >> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gy...@gmail.com>
> wrote:
> >> > Generally I am in favor of making these name changes. My only concern
> is
> >> > regarding to the one-input and multiple inputs operators.
> >> >
> >> > There is a general problem with the n-ary operators regarding type
> >> safety,
> >> > thats why we now have SingleInput and Co (two-input) operators. I
> think
> >> we
> >> > should keep these.
> >> >
> >> > On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <
> aljoscha@apache.org>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >> since I'm currently reworking the Stream operators I thought it's a
> >> >> good time to talk about the naming of some classes. We have some
> >> >> legacy problems with lots of Operators, OperatorBases, TwoInput,
> >> >> OneInput, Unary, Binary, etc. And maybe we can break things in
> >> >> streaming to have more consistent and future-proof naming.
> >> >>
> >> >> In streaming, there are:
> >> >> - Tasks, these are an AbstractInvokabe and contain the main loop of a
> >> >> streaming vertex. They read from the inputs and forward data to the
> >> >> operator implementation.
> >> >>
> >> >> - Operators, these are invoked by a Task and are responsible for the
> >> >> actual logic of the operator. Think Map, Join, Reduce and so on.
> These
> >> >> are responsible for calling the user-defined function.
> >> >>
> >> >> - Operators (again, I know), these are user facing classes (some
> >> >> derived from DataStream, some not). There is for example
> >> >> SingleOutputStreamOperator, for the result of a DataStream
> >> >> transformation that has a single output. There are also
> >> >> TemporalOperator and its derived classes StreamCrossOperator and
> >> >> StreamJoinOperator. The actual operator inside a task (the ones I
> >> >> mentioned before that are responsible for the user logic) that
> >> >> executes a temporal join is called CoStreamWindow (with a
> >> >> JoinWindowFunction).
> >> >>
> >> >> As I currently have it in my PR, there are two Task classes, one for
> >> >> single input, and one for two-input operators. There are also the
> >> >> corresponding operator interfaces for unary and binary operators (see
> >> >> what I did there ... :D).
> >> >>
> >> >> What should we call all these classes (concepts). Also I'm heavily in
> >> >> favour of dropping all the Stream (or Streaming) prefixes and
> suffixes
> >> >> from the class names. I know I'm in streaming because the package is
> >> >> named streaming. And we should not restrain ourselves because the
> >> >> batch API also has things called operator.
> >> >>
> >> >> Also, the concept of one-input, two-input tasks and operators is not
> >> >> very scalable, Maybe we should have a single interface for operators
> >> >> that has a receiveElement(int, element) method that tells the
> operator
> >> >> from which input an element came. Then we can scale this to n-ary
> >> >> operators. This would of course have the overhead of always sending
> >> >> along the number of the input instead of encoding the input number in
> >> >> the method name, such as receiveElement1() and receiveElement2().
> >> >>
> >> >> Any thoughts? :D (I know I'm writing the long annoying emails today
> >> >> but I think it is important we discuss these things before being
> stuck
> >> >> with them.)
> >> >>
> >> >> Cheers,
> >> >> Aljoscha
> >> >>
> >>
>

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by Aljoscha Krettek <al...@apache.org>.
Come to think of it, why do we even need SingleOutputStreamOperator?
It is just a subclass of DataStream that has almost no functionality
that couldn't be implemented in DataStream. I think it makes people
wonder why the result of a transformation is not a DataStream but this
mouthful of a class.

And, I light of other possibilities such as MapDriver and PactDriver I
am quite happy with calling the things StreamOperator and StreamMap.
:D

On Sat, May 9, 2015 at 5:20 PM, Márton Balassi <ba...@gmail.com> wrote:
> Hi,
>
> I am in favor of removing the Stream (or Streaming) suffixes and prefixes.
> I think that Gyula was also referring to those.
>
> I think the naming of the Tasks, and user facing operators
> (SingleOutputStreamOperator and alike) are fine.
>
> As for the other bunch of Operators we could name them Drivers to be mostly
> in line with the batch naming. By the way, most of the classes do not have
> "Operator" in their name currently - e.g. the one encapsulating the map
> functionality is called StreamMap, however the base classes (StreamOperator
> and ChainableStreamOperator) have it in their name explicitly. I could go
> with MapDriver instead of StreamMap, ChainableStreamOperator will be
> eliminated anyway - StreamOperator needs a new name then: worst case
> scenario PactDriver. :)
>
> As for n-ary operators I agree with Gyula.
>
> On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> Which name changes are you referring to? The proposed names in my
>> recent PR? Or the dropping of Stream from all the classes. For the
>> rest I was just rambling about how I don't like the names in the batch
>> API. :D
>>
>> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gy...@gmail.com> wrote:
>> > Generally I am in favor of making these name changes. My only concern is
>> > regarding to the one-input and multiple inputs operators.
>> >
>> > There is a general problem with the n-ary operators regarding type
>> safety,
>> > thats why we now have SingleInput and Co (two-input) operators. I think
>> we
>> > should keep these.
>> >
>> > On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <al...@apache.org>
>> > wrote:
>> >
>> >> Hi,
>> >> since I'm currently reworking the Stream operators I thought it's a
>> >> good time to talk about the naming of some classes. We have some
>> >> legacy problems with lots of Operators, OperatorBases, TwoInput,
>> >> OneInput, Unary, Binary, etc. And maybe we can break things in
>> >> streaming to have more consistent and future-proof naming.
>> >>
>> >> In streaming, there are:
>> >> - Tasks, these are an AbstractInvokabe and contain the main loop of a
>> >> streaming vertex. They read from the inputs and forward data to the
>> >> operator implementation.
>> >>
>> >> - Operators, these are invoked by a Task and are responsible for the
>> >> actual logic of the operator. Think Map, Join, Reduce and so on. These
>> >> are responsible for calling the user-defined function.
>> >>
>> >> - Operators (again, I know), these are user facing classes (some
>> >> derived from DataStream, some not). There is for example
>> >> SingleOutputStreamOperator, for the result of a DataStream
>> >> transformation that has a single output. There are also
>> >> TemporalOperator and its derived classes StreamCrossOperator and
>> >> StreamJoinOperator. The actual operator inside a task (the ones I
>> >> mentioned before that are responsible for the user logic) that
>> >> executes a temporal join is called CoStreamWindow (with a
>> >> JoinWindowFunction).
>> >>
>> >> As I currently have it in my PR, there are two Task classes, one for
>> >> single input, and one for two-input operators. There are also the
>> >> corresponding operator interfaces for unary and binary operators (see
>> >> what I did there ... :D).
>> >>
>> >> What should we call all these classes (concepts). Also I'm heavily in
>> >> favour of dropping all the Stream (or Streaming) prefixes and suffixes
>> >> from the class names. I know I'm in streaming because the package is
>> >> named streaming. And we should not restrain ourselves because the
>> >> batch API also has things called operator.
>> >>
>> >> Also, the concept of one-input, two-input tasks and operators is not
>> >> very scalable, Maybe we should have a single interface for operators
>> >> that has a receiveElement(int, element) method that tells the operator
>> >> from which input an element came. Then we can scale this to n-ary
>> >> operators. This would of course have the overhead of always sending
>> >> along the number of the input instead of encoding the input number in
>> >> the method name, such as receiveElement1() and receiveElement2().
>> >>
>> >> Any thoughts? :D (I know I'm writing the long annoying emails today
>> >> but I think it is important we discuss these things before being stuck
>> >> with them.)
>> >>
>> >> Cheers,
>> >> Aljoscha
>> >>
>>

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by Márton Balassi <ba...@gmail.com>.
Hi,

I am in favor of removing the Stream (or Streaming) suffixes and prefixes.
I think that Gyula was also referring to those.

I think the naming of the Tasks, and user facing operators
(SingleOutputStreamOperator and alike) are fine.

As for the other bunch of Operators we could name them Drivers to be mostly
in line with the batch naming. By the way, most of the classes do not have
"Operator" in their name currently - e.g. the one encapsulating the map
functionality is called StreamMap, however the base classes (StreamOperator
and ChainableStreamOperator) have it in their name explicitly. I could go
with MapDriver instead of StreamMap, ChainableStreamOperator will be
eliminated anyway - StreamOperator needs a new name then: worst case
scenario PactDriver. :)

As for n-ary operators I agree with Gyula.

On Sat, May 9, 2015 at 4:44 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Which name changes are you referring to? The proposed names in my
> recent PR? Or the dropping of Stream from all the classes. For the
> rest I was just rambling about how I don't like the names in the batch
> API. :D
>
> On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gy...@gmail.com> wrote:
> > Generally I am in favor of making these name changes. My only concern is
> > regarding to the one-input and multiple inputs operators.
> >
> > There is a general problem with the n-ary operators regarding type
> safety,
> > thats why we now have SingleInput and Co (two-input) operators. I think
> we
> > should keep these.
> >
> > On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> >
> >> Hi,
> >> since I'm currently reworking the Stream operators I thought it's a
> >> good time to talk about the naming of some classes. We have some
> >> legacy problems with lots of Operators, OperatorBases, TwoInput,
> >> OneInput, Unary, Binary, etc. And maybe we can break things in
> >> streaming to have more consistent and future-proof naming.
> >>
> >> In streaming, there are:
> >> - Tasks, these are an AbstractInvokabe and contain the main loop of a
> >> streaming vertex. They read from the inputs and forward data to the
> >> operator implementation.
> >>
> >> - Operators, these are invoked by a Task and are responsible for the
> >> actual logic of the operator. Think Map, Join, Reduce and so on. These
> >> are responsible for calling the user-defined function.
> >>
> >> - Operators (again, I know), these are user facing classes (some
> >> derived from DataStream, some not). There is for example
> >> SingleOutputStreamOperator, for the result of a DataStream
> >> transformation that has a single output. There are also
> >> TemporalOperator and its derived classes StreamCrossOperator and
> >> StreamJoinOperator. The actual operator inside a task (the ones I
> >> mentioned before that are responsible for the user logic) that
> >> executes a temporal join is called CoStreamWindow (with a
> >> JoinWindowFunction).
> >>
> >> As I currently have it in my PR, there are two Task classes, one for
> >> single input, and one for two-input operators. There are also the
> >> corresponding operator interfaces for unary and binary operators (see
> >> what I did there ... :D).
> >>
> >> What should we call all these classes (concepts). Also I'm heavily in
> >> favour of dropping all the Stream (or Streaming) prefixes and suffixes
> >> from the class names. I know I'm in streaming because the package is
> >> named streaming. And we should not restrain ourselves because the
> >> batch API also has things called operator.
> >>
> >> Also, the concept of one-input, two-input tasks and operators is not
> >> very scalable, Maybe we should have a single interface for operators
> >> that has a receiveElement(int, element) method that tells the operator
> >> from which input an element came. Then we can scale this to n-ary
> >> operators. This would of course have the overhead of always sending
> >> along the number of the input instead of encoding the input number in
> >> the method name, such as receiveElement1() and receiveElement2().
> >>
> >> Any thoughts? :D (I know I'm writing the long annoying emails today
> >> but I think it is important we discuss these things before being stuck
> >> with them.)
> >>
> >> Cheers,
> >> Aljoscha
> >>
>

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by Aljoscha Krettek <al...@apache.org>.
Which name changes are you referring to? The proposed names in my
recent PR? Or the dropping of Stream from all the classes. For the
rest I was just rambling about how I don't like the names in the batch
API. :D

On Fri, May 8, 2015 at 12:31 PM, Gyula Fóra <gy...@gmail.com> wrote:
> Generally I am in favor of making these name changes. My only concern is
> regarding to the one-input and multiple inputs operators.
>
> There is a general problem with the n-ary operators regarding type safety,
> thats why we now have SingleInput and Co (two-input) operators. I think we
> should keep these.
>
> On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <al...@apache.org>
> wrote:
>
>> Hi,
>> since I'm currently reworking the Stream operators I thought it's a
>> good time to talk about the naming of some classes. We have some
>> legacy problems with lots of Operators, OperatorBases, TwoInput,
>> OneInput, Unary, Binary, etc. And maybe we can break things in
>> streaming to have more consistent and future-proof naming.
>>
>> In streaming, there are:
>> - Tasks, these are an AbstractInvokabe and contain the main loop of a
>> streaming vertex. They read from the inputs and forward data to the
>> operator implementation.
>>
>> - Operators, these are invoked by a Task and are responsible for the
>> actual logic of the operator. Think Map, Join, Reduce and so on. These
>> are responsible for calling the user-defined function.
>>
>> - Operators (again, I know), these are user facing classes (some
>> derived from DataStream, some not). There is for example
>> SingleOutputStreamOperator, for the result of a DataStream
>> transformation that has a single output. There are also
>> TemporalOperator and its derived classes StreamCrossOperator and
>> StreamJoinOperator. The actual operator inside a task (the ones I
>> mentioned before that are responsible for the user logic) that
>> executes a temporal join is called CoStreamWindow (with a
>> JoinWindowFunction).
>>
>> As I currently have it in my PR, there are two Task classes, one for
>> single input, and one for two-input operators. There are also the
>> corresponding operator interfaces for unary and binary operators (see
>> what I did there ... :D).
>>
>> What should we call all these classes (concepts). Also I'm heavily in
>> favour of dropping all the Stream (or Streaming) prefixes and suffixes
>> from the class names. I know I'm in streaming because the package is
>> named streaming. And we should not restrain ourselves because the
>> batch API also has things called operator.
>>
>> Also, the concept of one-input, two-input tasks and operators is not
>> very scalable, Maybe we should have a single interface for operators
>> that has a receiveElement(int, element) method that tells the operator
>> from which input an element came. Then we can scale this to n-ary
>> operators. This would of course have the overhead of always sending
>> along the number of the input instead of encoding the input number in
>> the method name, such as receiveElement1() and receiveElement2().
>>
>> Any thoughts? :D (I know I'm writing the long annoying emails today
>> but I think it is important we discuss these things before being stuck
>> with them.)
>>
>> Cheers,
>> Aljoscha
>>

Re: [DISCUSS] Naming and Functionality of Stream Operators and Tasks

Posted by Gyula Fóra <gy...@gmail.com>.
Generally I am in favor of making these name changes. My only concern is
regarding to the one-input and multiple inputs operators.

There is a general problem with the n-ary operators regarding type safety,
thats why we now have SingleInput and Co (two-input) operators. I think we
should keep these.

On Fri, May 8, 2015 at 11:38 AM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> since I'm currently reworking the Stream operators I thought it's a
> good time to talk about the naming of some classes. We have some
> legacy problems with lots of Operators, OperatorBases, TwoInput,
> OneInput, Unary, Binary, etc. And maybe we can break things in
> streaming to have more consistent and future-proof naming.
>
> In streaming, there are:
> - Tasks, these are an AbstractInvokabe and contain the main loop of a
> streaming vertex. They read from the inputs and forward data to the
> operator implementation.
>
> - Operators, these are invoked by a Task and are responsible for the
> actual logic of the operator. Think Map, Join, Reduce and so on. These
> are responsible for calling the user-defined function.
>
> - Operators (again, I know), these are user facing classes (some
> derived from DataStream, some not). There is for example
> SingleOutputStreamOperator, for the result of a DataStream
> transformation that has a single output. There are also
> TemporalOperator and its derived classes StreamCrossOperator and
> StreamJoinOperator. The actual operator inside a task (the ones I
> mentioned before that are responsible for the user logic) that
> executes a temporal join is called CoStreamWindow (with a
> JoinWindowFunction).
>
> As I currently have it in my PR, there are two Task classes, one for
> single input, and one for two-input operators. There are also the
> corresponding operator interfaces for unary and binary operators (see
> what I did there ... :D).
>
> What should we call all these classes (concepts). Also I'm heavily in
> favour of dropping all the Stream (or Streaming) prefixes and suffixes
> from the class names. I know I'm in streaming because the package is
> named streaming. And we should not restrain ourselves because the
> batch API also has things called operator.
>
> Also, the concept of one-input, two-input tasks and operators is not
> very scalable, Maybe we should have a single interface for operators
> that has a receiveElement(int, element) method that tells the operator
> from which input an element came. Then we can scale this to n-ary
> operators. This would of course have the overhead of always sending
> along the number of the input instead of encoding the input number in
> the method name, such as receiveElement1() and receiveElement2().
>
> Any thoughts? :D (I know I'm writing the long annoying emails today
> but I think it is important we discuss these things before being stuck
> with them.)
>
> Cheers,
> Aljoscha
>