You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Fabian Hueske <fh...@gmail.com> on 2016/01/13 13:53:32 UTC

[DISCUSS] Remove Combinable Annotation from DataSet API

Hi everybody,



As part of the upcoming 1.0 release we are stabilizing Flink's APIs.

I would like to start a discussion about removing the Combinable annotation
from the DataSet API.



# The Current State:

The DataSet API offers two interface for combine functions, CombineFunction
and GroupCombineFunction, which can be added to a GroupReduceFunctions
(GRF).


However, implementing a combine interface does not make a GRF combinable.
In addition, the GRF class needs to be annotated with the Combinable
annotation. The RichGroupReduceFunction has a default implementation of
combine, which forwards the combine parameters to the reduce method. This
default implementation is not used, unless the class that extends RGRF has
the Combinable annotation.



In addition to the Combinable annotation, the DataSet API
GroupReduceOperator features the setCombinable(boolean) method to override
a missing or existing Combinable annotation.



# My Proposal:

I propose to remove the Combinable annotation because it is not required
and complicates the definition of combiners. Instead, the combine method of
a GroupReduceFunction should be executed if it implements one of the
combine function interfaces. This would require to remove the default
combine implementation of the RichGroupReduceFunction as well.

This would be an API breaking change and has a few implications.



# Possible Implementation:

There are a few ways to implement this change.

- Remove Combinable annotation or mark it deprecated (and keep effect)

- Remove default combine method from RichGroupReduceFunction or deprecate it



Approach 1:
- Remove Combinable annotation
- Remove default combine() method from RichGroupReduceFunction
- Effect:
    - All RichGroupReduceFunctions that do either have the Combinable
annotation or implement the combine method do not compile anymore
    - GroupReduceFunctions that have the Combinable annotation do not
compile anymore
    - GroupReduceFunctions that implement a combine interface without
having the annotation (and not being set to combinable during program
construction) will execute the previously not executed combine method. This
might change the behavior of the program. In worst case, the program might
silently produce wrong results (or crash) if the combiner implementation
was faulty. In best case, the program executes faster.



Approach 2:
- As Approach 1
- In addition extend both combine interfaces with a deprecated marker
method. This will ensure that all functions that implement a combinable
interface do not compile anymore and need to be fixed. This could prevent
the silent failure as in Approach 1, but would also cause an additional API
breaking change once the deprecated marker method is removed again.



Approach 3:
- Mark Combinable annotation deprecated
- Mark combine() method in RichGroupReduceFunction as deprecated
- Effect:
    - There'll be a couple of deprecation warnings.
    - We face the same problem with silent failures as in Approach 1.
    - We have to check if RichGroupReduceFunction's override combine or not
(can be done with reflection). If the method is not overridden we do not
execute it (unless there is a Combinable annotation) and we are fine. If it
is overridden and no Combinable annotation has been defined, we have the
same problem with silent failures as before.
    - After we remove the deprecated annotation and method, we have the
same effect as with Approach 1.



There are more alternatives, but these are the most viable, IMO.



I think, if we want to remove the combinable annotation, we should do it
now.

Given the three options, would go for Approach 1. Yes, breaks a lot of code
and yes there is the possibility of computing incorrect results. Approach 2
is safer but would mean another API breaking change in the future. Approach
3 comes with fewer breaking changes but has the same problem of silent
failures.

IMO, the breaking API changes of Approach 1 are even desirable because they
will make users aware that this feature changed.



What do you think?



Cheers, Fabian

Re: [DISCUSS] Remove Combinable Annotation from DataSet API

Posted by Fabian Hueske <fh...@gmail.com>.
OK, I think we got a clear picture here.

I'll update the corresponding JIRA issue FLINK-1045.

Thanks, Fabian

2016-01-14 18:53 GMT+01:00 Henry Saputra <he...@gmail.com>:

> +1 for approach #1
>
>
>
> - Henry
>
> On Thu, Jan 14, 2016 at 5:35 AM, Chiwan Park <ch...@apache.org>
> wrote:
>
> > I’m also for approach #1. Now is nice time to apply some API-breaking
> > changes.
> >
> > > On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek <al...@apache.org>
> > wrote:
> > >
> > > I’m also for Approach #1. I like simplifying things.
> > >> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <vasilikikalavri@gmail.com
> >
> > wrote:
> > >>
> > >> Hi,
> > >>
> > >> ​+1 for removing the Combinable annotation​. Approach 1 sounds like
> the
> > >> best option to me.
> > >>
> > >>
> > >> On 13 January 2016 at 14:11, Till Rohrmann <tr...@apache.org>
> > wrote:
> > >>
> > >>> Hi Fabian,
> > >>>
> > >>> thanks for bringing this issue up. I agree with you that it would be
> > nice
> > >>> to remove the Combinable annotation if it is not really needed. Now
> if
> > >>> people are not aware of the Combinable interface then they might
> think
> > that
> > >>> they are actually using combiners by simply implementing
> > CombineFunction.
> > >>>
> > >>>
> > >> ​has happened to me :S​
> > >>
> > >>
> > >>
> > >>> I would also be in favour of approach 1 combined with a migration
> guide
> > >>> where we highlight possible problems with a faulty combine
> > implementation.
> > >>>
> > >>>
> > >> Migration guide is a ​good idea​!
> > >>
> > >>
> > >>
> > >>> Cheers,
> > >>> Till
> > >>> ​
> > >>>
> > >>>
> > >> ​-Vasia.​
> > >>
> > >>
> > >>
> > >>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fh...@gmail.com>
> > wrote:
> > >>>
> > >>>> Hi everybody,
> > >>>>
> > >>>>
> > >>>>
> > >>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
> > >>>>
> > >>>> I would like to start a discussion about removing the Combinable
> > >>> annotation
> > >>>> from the DataSet API.
> > >>>>
> > >>>>
> > >>>>
> > >>>> # The Current State:
> > >>>>
> > >>>> The DataSet API offers two interface for combine functions,
> > >>> CombineFunction
> > >>>> and GroupCombineFunction, which can be added to a
> GroupReduceFunctions
> > >>>> (GRF).
> > >>>>
> > >>>>
> > >>>> However, implementing a combine interface does not make a GRF
> > combinable.
> > >>>> In addition, the GRF class needs to be annotated with the Combinable
> > >>>> annotation. The RichGroupReduceFunction has a default implementation
> > of
> > >>>> combine, which forwards the combine parameters to the reduce method.
> > This
> > >>>> default implementation is not used, unless the class that extends
> RGRF
> > >>> has
> > >>>> the Combinable annotation.
> > >>>>
> > >>>>
> > >>>>
> > >>>> In addition to the Combinable annotation, the DataSet API
> > >>>> GroupReduceOperator features the setCombinable(boolean) method to
> > >>> override
> > >>>> a missing or existing Combinable annotation.
> > >>>>
> > >>>>
> > >>>>
> > >>>> # My Proposal:
> > >>>>
> > >>>> I propose to remove the Combinable annotation because it is not
> > required
> > >>>> and complicates the definition of combiners. Instead, the combine
> > method
> > >>> of
> > >>>> a GroupReduceFunction should be executed if it implements one of the
> > >>>> combine function interfaces. This would require to remove the
> default
> > >>>> combine implementation of the RichGroupReduceFunction as well.
> > >>>>
> > >>>> This would be an API breaking change and has a few implications.
> > >>>>
> > >>>>
> > >>>>
> > >>>> # Possible Implementation:
> > >>>>
> > >>>> There are a few ways to implement this change.
> > >>>>
> > >>>> - Remove Combinable annotation or mark it deprecated (and keep
> effect)
> > >>>>
> > >>>> - Remove default combine method from RichGroupReduceFunction or
> > deprecate
> > >>>> it
> > >>>>
> > >>>>
> > >>>>
> > >>>> Approach 1:
> > >>>> - Remove Combinable annotation
> > >>>> - Remove default combine() method from RichGroupReduceFunction
> > >>>> - Effect:
> > >>>>   - All RichGroupReduceFunctions that do either have the Combinable
> > >>>> annotation or implement the combine method do not compile anymore
> > >>>>   - GroupReduceFunctions that have the Combinable annotation do not
> > >>>> compile anymore
> > >>>>   - GroupReduceFunctions that implement a combine interface without
> > >>>> having the annotation (and not being set to combinable during
> program
> > >>>> construction) will execute the previously not executed combine
> method.
> > >>> This
> > >>>> might change the behavior of the program. In worst case, the program
> > >>> might
> > >>>> silently produce wrong results (or crash) if the combiner
> > implementation
> > >>>> was faulty. In best case, the program executes faster.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Approach 2:
> > >>>> - As Approach 1
> > >>>> - In addition extend both combine interfaces with a deprecated
> marker
> > >>>> method. This will ensure that all functions that implement a
> > combinable
> > >>>> interface do not compile anymore and need to be fixed. This could
> > prevent
> > >>>> the silent failure as in Approach 1, but would also cause an
> > additional
> > >>> API
> > >>>> breaking change once the deprecated marker method is removed again.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Approach 3:
> > >>>> - Mark Combinable annotation deprecated
> > >>>> - Mark combine() method in RichGroupReduceFunction as deprecated
> > >>>> - Effect:
> > >>>>   - There'll be a couple of deprecation warnings.
> > >>>>   - We face the same problem with silent failures as in Approach 1.
> > >>>>   - We have to check if RichGroupReduceFunction's override combine
> or
> > >>> not
> > >>>> (can be done with reflection). If the method is not overridden we do
> > not
> > >>>> execute it (unless there is a Combinable annotation) and we are
> fine.
> > If
> > >>> it
> > >>>> is overridden and no Combinable annotation has been defined, we have
> > the
> > >>>> same problem with silent failures as before.
> > >>>>   - After we remove the deprecated annotation and method, we have
> the
> > >>>> same effect as with Approach 1.
> > >>>>
> > >>>>
> > >>>>
> > >>>> There are more alternatives, but these are the most viable, IMO.
> > >>>>
> > >>>>
> > >>>>
> > >>>> I think, if we want to remove the combinable annotation, we should
> do
> > it
> > >>>> now.
> > >>>>
> > >>>> Given the three options, would go for Approach 1. Yes, breaks a lot
> of
> > >>> code
> > >>>> and yes there is the possibility of computing incorrect results.
> > >>> Approach 2
> > >>>> is safer but would mean another API breaking change in the future.
> > >>> Approach
> > >>>> 3 comes with fewer breaking changes but has the same problem of
> silent
> > >>>> failures.
> > >>>>
> > >>>> IMO, the breaking API changes of Approach 1 are even desirable
> because
> > >>> they
> > >>>> will make users aware that this feature changed.
> > >>>>
> > >>>>
> > >>>>
> > >>>> What do you think?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Cheers, Fabian
> > >>>>
> >
> > Regards,
> > Chiwan Park
> >
> >
> >
>

Re: [DISCUSS] Remove Combinable Annotation from DataSet API

Posted by Henry Saputra <he...@gmail.com>.
+1 for approach #1



- Henry

On Thu, Jan 14, 2016 at 5:35 AM, Chiwan Park <ch...@apache.org> wrote:

> I’m also for approach #1. Now is nice time to apply some API-breaking
> changes.
>
> > On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek <al...@apache.org>
> wrote:
> >
> > I’m also for Approach #1. I like simplifying things.
> >> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <va...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> ​+1 for removing the Combinable annotation​. Approach 1 sounds like the
> >> best option to me.
> >>
> >>
> >> On 13 January 2016 at 14:11, Till Rohrmann <tr...@apache.org>
> wrote:
> >>
> >>> Hi Fabian,
> >>>
> >>> thanks for bringing this issue up. I agree with you that it would be
> nice
> >>> to remove the Combinable annotation if it is not really needed. Now if
> >>> people are not aware of the Combinable interface then they might think
> that
> >>> they are actually using combiners by simply implementing
> CombineFunction.
> >>>
> >>>
> >> ​has happened to me :S​
> >>
> >>
> >>
> >>> I would also be in favour of approach 1 combined with a migration guide
> >>> where we highlight possible problems with a faulty combine
> implementation.
> >>>
> >>>
> >> Migration guide is a ​good idea​!
> >>
> >>
> >>
> >>> Cheers,
> >>> Till
> >>> ​
> >>>
> >>>
> >> ​-Vasia.​
> >>
> >>
> >>
> >>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fh...@gmail.com>
> wrote:
> >>>
> >>>> Hi everybody,
> >>>>
> >>>>
> >>>>
> >>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
> >>>>
> >>>> I would like to start a discussion about removing the Combinable
> >>> annotation
> >>>> from the DataSet API.
> >>>>
> >>>>
> >>>>
> >>>> # The Current State:
> >>>>
> >>>> The DataSet API offers two interface for combine functions,
> >>> CombineFunction
> >>>> and GroupCombineFunction, which can be added to a GroupReduceFunctions
> >>>> (GRF).
> >>>>
> >>>>
> >>>> However, implementing a combine interface does not make a GRF
> combinable.
> >>>> In addition, the GRF class needs to be annotated with the Combinable
> >>>> annotation. The RichGroupReduceFunction has a default implementation
> of
> >>>> combine, which forwards the combine parameters to the reduce method.
> This
> >>>> default implementation is not used, unless the class that extends RGRF
> >>> has
> >>>> the Combinable annotation.
> >>>>
> >>>>
> >>>>
> >>>> In addition to the Combinable annotation, the DataSet API
> >>>> GroupReduceOperator features the setCombinable(boolean) method to
> >>> override
> >>>> a missing or existing Combinable annotation.
> >>>>
> >>>>
> >>>>
> >>>> # My Proposal:
> >>>>
> >>>> I propose to remove the Combinable annotation because it is not
> required
> >>>> and complicates the definition of combiners. Instead, the combine
> method
> >>> of
> >>>> a GroupReduceFunction should be executed if it implements one of the
> >>>> combine function interfaces. This would require to remove the default
> >>>> combine implementation of the RichGroupReduceFunction as well.
> >>>>
> >>>> This would be an API breaking change and has a few implications.
> >>>>
> >>>>
> >>>>
> >>>> # Possible Implementation:
> >>>>
> >>>> There are a few ways to implement this change.
> >>>>
> >>>> - Remove Combinable annotation or mark it deprecated (and keep effect)
> >>>>
> >>>> - Remove default combine method from RichGroupReduceFunction or
> deprecate
> >>>> it
> >>>>
> >>>>
> >>>>
> >>>> Approach 1:
> >>>> - Remove Combinable annotation
> >>>> - Remove default combine() method from RichGroupReduceFunction
> >>>> - Effect:
> >>>>   - All RichGroupReduceFunctions that do either have the Combinable
> >>>> annotation or implement the combine method do not compile anymore
> >>>>   - GroupReduceFunctions that have the Combinable annotation do not
> >>>> compile anymore
> >>>>   - GroupReduceFunctions that implement a combine interface without
> >>>> having the annotation (and not being set to combinable during program
> >>>> construction) will execute the previously not executed combine method.
> >>> This
> >>>> might change the behavior of the program. In worst case, the program
> >>> might
> >>>> silently produce wrong results (or crash) if the combiner
> implementation
> >>>> was faulty. In best case, the program executes faster.
> >>>>
> >>>>
> >>>>
> >>>> Approach 2:
> >>>> - As Approach 1
> >>>> - In addition extend both combine interfaces with a deprecated marker
> >>>> method. This will ensure that all functions that implement a
> combinable
> >>>> interface do not compile anymore and need to be fixed. This could
> prevent
> >>>> the silent failure as in Approach 1, but would also cause an
> additional
> >>> API
> >>>> breaking change once the deprecated marker method is removed again.
> >>>>
> >>>>
> >>>>
> >>>> Approach 3:
> >>>> - Mark Combinable annotation deprecated
> >>>> - Mark combine() method in RichGroupReduceFunction as deprecated
> >>>> - Effect:
> >>>>   - There'll be a couple of deprecation warnings.
> >>>>   - We face the same problem with silent failures as in Approach 1.
> >>>>   - We have to check if RichGroupReduceFunction's override combine or
> >>> not
> >>>> (can be done with reflection). If the method is not overridden we do
> not
> >>>> execute it (unless there is a Combinable annotation) and we are fine.
> If
> >>> it
> >>>> is overridden and no Combinable annotation has been defined, we have
> the
> >>>> same problem with silent failures as before.
> >>>>   - After we remove the deprecated annotation and method, we have the
> >>>> same effect as with Approach 1.
> >>>>
> >>>>
> >>>>
> >>>> There are more alternatives, but these are the most viable, IMO.
> >>>>
> >>>>
> >>>>
> >>>> I think, if we want to remove the combinable annotation, we should do
> it
> >>>> now.
> >>>>
> >>>> Given the three options, would go for Approach 1. Yes, breaks a lot of
> >>> code
> >>>> and yes there is the possibility of computing incorrect results.
> >>> Approach 2
> >>>> is safer but would mean another API breaking change in the future.
> >>> Approach
> >>>> 3 comes with fewer breaking changes but has the same problem of silent
> >>>> failures.
> >>>>
> >>>> IMO, the breaking API changes of Approach 1 are even desirable because
> >>> they
> >>>> will make users aware that this feature changed.
> >>>>
> >>>>
> >>>>
> >>>> What do you think?
> >>>>
> >>>>
> >>>>
> >>>> Cheers, Fabian
> >>>>
>
> Regards,
> Chiwan Park
>
>
>

Re: [DISCUSS] Remove Combinable Annotation from DataSet API

Posted by Chiwan Park <ch...@apache.org>.
I’m also for approach #1. Now is nice time to apply some API-breaking changes.

> On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek <al...@apache.org> wrote:
> 
> I’m also for Approach #1. I like simplifying things.
>> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <va...@gmail.com> wrote:
>> 
>> Hi,
>> 
>> ​+1 for removing the Combinable annotation​. Approach 1 sounds like the
>> best option to me.
>> 
>> 
>> On 13 January 2016 at 14:11, Till Rohrmann <tr...@apache.org> wrote:
>> 
>>> Hi Fabian,
>>> 
>>> thanks for bringing this issue up. I agree with you that it would be nice
>>> to remove the Combinable annotation if it is not really needed. Now if
>>> people are not aware of the Combinable interface then they might think that
>>> they are actually using combiners by simply implementing CombineFunction.
>>> 
>>> 
>> ​has happened to me :S​
>> 
>> 
>> 
>>> I would also be in favour of approach 1 combined with a migration guide
>>> where we highlight possible problems with a faulty combine implementation.
>>> 
>>> 
>> Migration guide is a ​good idea​!
>> 
>> 
>> 
>>> Cheers,
>>> Till
>>> ​
>>> 
>>> 
>> ​-Vasia.​
>> 
>> 
>> 
>>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fh...@gmail.com> wrote:
>>> 
>>>> Hi everybody,
>>>> 
>>>> 
>>>> 
>>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>>>> 
>>>> I would like to start a discussion about removing the Combinable
>>> annotation
>>>> from the DataSet API.
>>>> 
>>>> 
>>>> 
>>>> # The Current State:
>>>> 
>>>> The DataSet API offers two interface for combine functions,
>>> CombineFunction
>>>> and GroupCombineFunction, which can be added to a GroupReduceFunctions
>>>> (GRF).
>>>> 
>>>> 
>>>> However, implementing a combine interface does not make a GRF combinable.
>>>> In addition, the GRF class needs to be annotated with the Combinable
>>>> annotation. The RichGroupReduceFunction has a default implementation of
>>>> combine, which forwards the combine parameters to the reduce method. This
>>>> default implementation is not used, unless the class that extends RGRF
>>> has
>>>> the Combinable annotation.
>>>> 
>>>> 
>>>> 
>>>> In addition to the Combinable annotation, the DataSet API
>>>> GroupReduceOperator features the setCombinable(boolean) method to
>>> override
>>>> a missing or existing Combinable annotation.
>>>> 
>>>> 
>>>> 
>>>> # My Proposal:
>>>> 
>>>> I propose to remove the Combinable annotation because it is not required
>>>> and complicates the definition of combiners. Instead, the combine method
>>> of
>>>> a GroupReduceFunction should be executed if it implements one of the
>>>> combine function interfaces. This would require to remove the default
>>>> combine implementation of the RichGroupReduceFunction as well.
>>>> 
>>>> This would be an API breaking change and has a few implications.
>>>> 
>>>> 
>>>> 
>>>> # Possible Implementation:
>>>> 
>>>> There are a few ways to implement this change.
>>>> 
>>>> - Remove Combinable annotation or mark it deprecated (and keep effect)
>>>> 
>>>> - Remove default combine method from RichGroupReduceFunction or deprecate
>>>> it
>>>> 
>>>> 
>>>> 
>>>> Approach 1:
>>>> - Remove Combinable annotation
>>>> - Remove default combine() method from RichGroupReduceFunction
>>>> - Effect:
>>>>   - All RichGroupReduceFunctions that do either have the Combinable
>>>> annotation or implement the combine method do not compile anymore
>>>>   - GroupReduceFunctions that have the Combinable annotation do not
>>>> compile anymore
>>>>   - GroupReduceFunctions that implement a combine interface without
>>>> having the annotation (and not being set to combinable during program
>>>> construction) will execute the previously not executed combine method.
>>> This
>>>> might change the behavior of the program. In worst case, the program
>>> might
>>>> silently produce wrong results (or crash) if the combiner implementation
>>>> was faulty. In best case, the program executes faster.
>>>> 
>>>> 
>>>> 
>>>> Approach 2:
>>>> - As Approach 1
>>>> - In addition extend both combine interfaces with a deprecated marker
>>>> method. This will ensure that all functions that implement a combinable
>>>> interface do not compile anymore and need to be fixed. This could prevent
>>>> the silent failure as in Approach 1, but would also cause an additional
>>> API
>>>> breaking change once the deprecated marker method is removed again.
>>>> 
>>>> 
>>>> 
>>>> Approach 3:
>>>> - Mark Combinable annotation deprecated
>>>> - Mark combine() method in RichGroupReduceFunction as deprecated
>>>> - Effect:
>>>>   - There'll be a couple of deprecation warnings.
>>>>   - We face the same problem with silent failures as in Approach 1.
>>>>   - We have to check if RichGroupReduceFunction's override combine or
>>> not
>>>> (can be done with reflection). If the method is not overridden we do not
>>>> execute it (unless there is a Combinable annotation) and we are fine. If
>>> it
>>>> is overridden and no Combinable annotation has been defined, we have the
>>>> same problem with silent failures as before.
>>>>   - After we remove the deprecated annotation and method, we have the
>>>> same effect as with Approach 1.
>>>> 
>>>> 
>>>> 
>>>> There are more alternatives, but these are the most viable, IMO.
>>>> 
>>>> 
>>>> 
>>>> I think, if we want to remove the combinable annotation, we should do it
>>>> now.
>>>> 
>>>> Given the three options, would go for Approach 1. Yes, breaks a lot of
>>> code
>>>> and yes there is the possibility of computing incorrect results.
>>> Approach 2
>>>> is safer but would mean another API breaking change in the future.
>>> Approach
>>>> 3 comes with fewer breaking changes but has the same problem of silent
>>>> failures.
>>>> 
>>>> IMO, the breaking API changes of Approach 1 are even desirable because
>>> they
>>>> will make users aware that this feature changed.
>>>> 
>>>> 
>>>> 
>>>> What do you think?
>>>> 
>>>> 
>>>> 
>>>> Cheers, Fabian
>>>> 

Regards,
Chiwan Park



Re: [DISCUSS] Remove Combinable Annotation from DataSet API

Posted by Aljoscha Krettek <al...@apache.org>.
I’m also for Approach #1. I like simplifying things.
> On 13 Jan 2016, at 14:25, Vasiliki Kalavri <va...@gmail.com> wrote:
> 
> Hi,
> 
> ​+1 for removing the Combinable annotation​. Approach 1 sounds like the
> best option to me.
> 
> 
> On 13 January 2016 at 14:11, Till Rohrmann <tr...@apache.org> wrote:
> 
>> Hi Fabian,
>> 
>> thanks for bringing this issue up. I agree with you that it would be nice
>> to remove the Combinable annotation if it is not really needed. Now if
>> people are not aware of the Combinable interface then they might think that
>> they are actually using combiners by simply implementing CombineFunction.
>> 
>> 
> ​has happened to me :S​
> 
> 
> 
>> I would also be in favour of approach 1 combined with a migration guide
>> where we highlight possible problems with a faulty combine implementation.
>> 
>> 
> Migration guide is a ​good idea​!
> 
> 
> 
>> Cheers,
>> Till
>> ​
>> 
>> 
> ​-Vasia.​
> 
> 
> 
>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fh...@gmail.com> wrote:
>> 
>>> Hi everybody,
>>> 
>>> 
>>> 
>>> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>>> 
>>> I would like to start a discussion about removing the Combinable
>> annotation
>>> from the DataSet API.
>>> 
>>> 
>>> 
>>> # The Current State:
>>> 
>>> The DataSet API offers two interface for combine functions,
>> CombineFunction
>>> and GroupCombineFunction, which can be added to a GroupReduceFunctions
>>> (GRF).
>>> 
>>> 
>>> However, implementing a combine interface does not make a GRF combinable.
>>> In addition, the GRF class needs to be annotated with the Combinable
>>> annotation. The RichGroupReduceFunction has a default implementation of
>>> combine, which forwards the combine parameters to the reduce method. This
>>> default implementation is not used, unless the class that extends RGRF
>> has
>>> the Combinable annotation.
>>> 
>>> 
>>> 
>>> In addition to the Combinable annotation, the DataSet API
>>> GroupReduceOperator features the setCombinable(boolean) method to
>> override
>>> a missing or existing Combinable annotation.
>>> 
>>> 
>>> 
>>> # My Proposal:
>>> 
>>> I propose to remove the Combinable annotation because it is not required
>>> and complicates the definition of combiners. Instead, the combine method
>> of
>>> a GroupReduceFunction should be executed if it implements one of the
>>> combine function interfaces. This would require to remove the default
>>> combine implementation of the RichGroupReduceFunction as well.
>>> 
>>> This would be an API breaking change and has a few implications.
>>> 
>>> 
>>> 
>>> # Possible Implementation:
>>> 
>>> There are a few ways to implement this change.
>>> 
>>> - Remove Combinable annotation or mark it deprecated (and keep effect)
>>> 
>>> - Remove default combine method from RichGroupReduceFunction or deprecate
>>> it
>>> 
>>> 
>>> 
>>> Approach 1:
>>> - Remove Combinable annotation
>>> - Remove default combine() method from RichGroupReduceFunction
>>> - Effect:
>>>    - All RichGroupReduceFunctions that do either have the Combinable
>>> annotation or implement the combine method do not compile anymore
>>>    - GroupReduceFunctions that have the Combinable annotation do not
>>> compile anymore
>>>    - GroupReduceFunctions that implement a combine interface without
>>> having the annotation (and not being set to combinable during program
>>> construction) will execute the previously not executed combine method.
>> This
>>> might change the behavior of the program. In worst case, the program
>> might
>>> silently produce wrong results (or crash) if the combiner implementation
>>> was faulty. In best case, the program executes faster.
>>> 
>>> 
>>> 
>>> Approach 2:
>>> - As Approach 1
>>> - In addition extend both combine interfaces with a deprecated marker
>>> method. This will ensure that all functions that implement a combinable
>>> interface do not compile anymore and need to be fixed. This could prevent
>>> the silent failure as in Approach 1, but would also cause an additional
>> API
>>> breaking change once the deprecated marker method is removed again.
>>> 
>>> 
>>> 
>>> Approach 3:
>>> - Mark Combinable annotation deprecated
>>> - Mark combine() method in RichGroupReduceFunction as deprecated
>>> - Effect:
>>>    - There'll be a couple of deprecation warnings.
>>>    - We face the same problem with silent failures as in Approach 1.
>>>    - We have to check if RichGroupReduceFunction's override combine or
>> not
>>> (can be done with reflection). If the method is not overridden we do not
>>> execute it (unless there is a Combinable annotation) and we are fine. If
>> it
>>> is overridden and no Combinable annotation has been defined, we have the
>>> same problem with silent failures as before.
>>>    - After we remove the deprecated annotation and method, we have the
>>> same effect as with Approach 1.
>>> 
>>> 
>>> 
>>> There are more alternatives, but these are the most viable, IMO.
>>> 
>>> 
>>> 
>>> I think, if we want to remove the combinable annotation, we should do it
>>> now.
>>> 
>>> Given the three options, would go for Approach 1. Yes, breaks a lot of
>> code
>>> and yes there is the possibility of computing incorrect results.
>> Approach 2
>>> is safer but would mean another API breaking change in the future.
>> Approach
>>> 3 comes with fewer breaking changes but has the same problem of silent
>>> failures.
>>> 
>>> IMO, the breaking API changes of Approach 1 are even desirable because
>> they
>>> will make users aware that this feature changed.
>>> 
>>> 
>>> 
>>> What do you think?
>>> 
>>> 
>>> 
>>> Cheers, Fabian
>>> 
>> 


Re: [DISCUSS] Remove Combinable Annotation from DataSet API

Posted by Vasiliki Kalavri <va...@gmail.com>.
Hi,

​+1 for removing the Combinable annotation​. Approach 1 sounds like the
best option to me.


On 13 January 2016 at 14:11, Till Rohrmann <tr...@apache.org> wrote:

> Hi Fabian,
>
> thanks for bringing this issue up. I agree with you that it would be nice
> to remove the Combinable annotation if it is not really needed. Now if
> people are not aware of the Combinable interface then they might think that
> they are actually using combiners by simply implementing CombineFunction.
>
>
​has happened to me :S​



> I would also be in favour of approach 1 combined with a migration guide
> where we highlight possible problems with a faulty combine implementation.
>
>
Migration guide is a ​good idea​!



> Cheers,
> Till
> ​
>
>
​-Vasia.​



> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fh...@gmail.com> wrote:
>
> > Hi everybody,
> >
> >
> >
> > As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
> >
> > I would like to start a discussion about removing the Combinable
> annotation
> > from the DataSet API.
> >
> >
> >
> > # The Current State:
> >
> > The DataSet API offers two interface for combine functions,
> CombineFunction
> > and GroupCombineFunction, which can be added to a GroupReduceFunctions
> > (GRF).
> >
> >
> > However, implementing a combine interface does not make a GRF combinable.
> > In addition, the GRF class needs to be annotated with the Combinable
> > annotation. The RichGroupReduceFunction has a default implementation of
> > combine, which forwards the combine parameters to the reduce method. This
> > default implementation is not used, unless the class that extends RGRF
> has
> > the Combinable annotation.
> >
> >
> >
> > In addition to the Combinable annotation, the DataSet API
> > GroupReduceOperator features the setCombinable(boolean) method to
> override
> > a missing or existing Combinable annotation.
> >
> >
> >
> > # My Proposal:
> >
> > I propose to remove the Combinable annotation because it is not required
> > and complicates the definition of combiners. Instead, the combine method
> of
> > a GroupReduceFunction should be executed if it implements one of the
> > combine function interfaces. This would require to remove the default
> > combine implementation of the RichGroupReduceFunction as well.
> >
> > This would be an API breaking change and has a few implications.
> >
> >
> >
> > # Possible Implementation:
> >
> > There are a few ways to implement this change.
> >
> > - Remove Combinable annotation or mark it deprecated (and keep effect)
> >
> > - Remove default combine method from RichGroupReduceFunction or deprecate
> > it
> >
> >
> >
> > Approach 1:
> > - Remove Combinable annotation
> > - Remove default combine() method from RichGroupReduceFunction
> > - Effect:
> >     - All RichGroupReduceFunctions that do either have the Combinable
> > annotation or implement the combine method do not compile anymore
> >     - GroupReduceFunctions that have the Combinable annotation do not
> > compile anymore
> >     - GroupReduceFunctions that implement a combine interface without
> > having the annotation (and not being set to combinable during program
> > construction) will execute the previously not executed combine method.
> This
> > might change the behavior of the program. In worst case, the program
> might
> > silently produce wrong results (or crash) if the combiner implementation
> > was faulty. In best case, the program executes faster.
> >
> >
> >
> > Approach 2:
> > - As Approach 1
> > - In addition extend both combine interfaces with a deprecated marker
> > method. This will ensure that all functions that implement a combinable
> > interface do not compile anymore and need to be fixed. This could prevent
> > the silent failure as in Approach 1, but would also cause an additional
> API
> > breaking change once the deprecated marker method is removed again.
> >
> >
> >
> > Approach 3:
> > - Mark Combinable annotation deprecated
> > - Mark combine() method in RichGroupReduceFunction as deprecated
> > - Effect:
> >     - There'll be a couple of deprecation warnings.
> >     - We face the same problem with silent failures as in Approach 1.
> >     - We have to check if RichGroupReduceFunction's override combine or
> not
> > (can be done with reflection). If the method is not overridden we do not
> > execute it (unless there is a Combinable annotation) and we are fine. If
> it
> > is overridden and no Combinable annotation has been defined, we have the
> > same problem with silent failures as before.
> >     - After we remove the deprecated annotation and method, we have the
> > same effect as with Approach 1.
> >
> >
> >
> > There are more alternatives, but these are the most viable, IMO.
> >
> >
> >
> > I think, if we want to remove the combinable annotation, we should do it
> > now.
> >
> > Given the three options, would go for Approach 1. Yes, breaks a lot of
> code
> > and yes there is the possibility of computing incorrect results.
> Approach 2
> > is safer but would mean another API breaking change in the future.
> Approach
> > 3 comes with fewer breaking changes but has the same problem of silent
> > failures.
> >
> > IMO, the breaking API changes of Approach 1 are even desirable because
> they
> > will make users aware that this feature changed.
> >
> >
> >
> > What do you think?
> >
> >
> >
> > Cheers, Fabian
> >
>

Re: [DISCUSS] Remove Combinable Annotation from DataSet API

Posted by Till Rohrmann <tr...@apache.org>.
Hi Fabian,

thanks for bringing this issue up. I agree with you that it would be nice
to remove the Combinable annotation if it is not really needed. Now if
people are not aware of the Combinable interface then they might think that
they are actually using combiners by simply implementing CombineFunction.

I would also be in favour of approach 1 combined with a migration guide
where we highlight possible problems with a faulty combine implementation.

Cheers,
Till
​

On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi everybody,
>
>
>
> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>
> I would like to start a discussion about removing the Combinable annotation
> from the DataSet API.
>
>
>
> # The Current State:
>
> The DataSet API offers two interface for combine functions, CombineFunction
> and GroupCombineFunction, which can be added to a GroupReduceFunctions
> (GRF).
>
>
> However, implementing a combine interface does not make a GRF combinable.
> In addition, the GRF class needs to be annotated with the Combinable
> annotation. The RichGroupReduceFunction has a default implementation of
> combine, which forwards the combine parameters to the reduce method. This
> default implementation is not used, unless the class that extends RGRF has
> the Combinable annotation.
>
>
>
> In addition to the Combinable annotation, the DataSet API
> GroupReduceOperator features the setCombinable(boolean) method to override
> a missing or existing Combinable annotation.
>
>
>
> # My Proposal:
>
> I propose to remove the Combinable annotation because it is not required
> and complicates the definition of combiners. Instead, the combine method of
> a GroupReduceFunction should be executed if it implements one of the
> combine function interfaces. This would require to remove the default
> combine implementation of the RichGroupReduceFunction as well.
>
> This would be an API breaking change and has a few implications.
>
>
>
> # Possible Implementation:
>
> There are a few ways to implement this change.
>
> - Remove Combinable annotation or mark it deprecated (and keep effect)
>
> - Remove default combine method from RichGroupReduceFunction or deprecate
> it
>
>
>
> Approach 1:
> - Remove Combinable annotation
> - Remove default combine() method from RichGroupReduceFunction
> - Effect:
>     - All RichGroupReduceFunctions that do either have the Combinable
> annotation or implement the combine method do not compile anymore
>     - GroupReduceFunctions that have the Combinable annotation do not
> compile anymore
>     - GroupReduceFunctions that implement a combine interface without
> having the annotation (and not being set to combinable during program
> construction) will execute the previously not executed combine method. This
> might change the behavior of the program. In worst case, the program might
> silently produce wrong results (or crash) if the combiner implementation
> was faulty. In best case, the program executes faster.
>
>
>
> Approach 2:
> - As Approach 1
> - In addition extend both combine interfaces with a deprecated marker
> method. This will ensure that all functions that implement a combinable
> interface do not compile anymore and need to be fixed. This could prevent
> the silent failure as in Approach 1, but would also cause an additional API
> breaking change once the deprecated marker method is removed again.
>
>
>
> Approach 3:
> - Mark Combinable annotation deprecated
> - Mark combine() method in RichGroupReduceFunction as deprecated
> - Effect:
>     - There'll be a couple of deprecation warnings.
>     - We face the same problem with silent failures as in Approach 1.
>     - We have to check if RichGroupReduceFunction's override combine or not
> (can be done with reflection). If the method is not overridden we do not
> execute it (unless there is a Combinable annotation) and we are fine. If it
> is overridden and no Combinable annotation has been defined, we have the
> same problem with silent failures as before.
>     - After we remove the deprecated annotation and method, we have the
> same effect as with Approach 1.
>
>
>
> There are more alternatives, but these are the most viable, IMO.
>
>
>
> I think, if we want to remove the combinable annotation, we should do it
> now.
>
> Given the three options, would go for Approach 1. Yes, breaks a lot of code
> and yes there is the possibility of computing incorrect results. Approach 2
> is safer but would mean another API breaking change in the future. Approach
> 3 comes with fewer breaking changes but has the same problem of silent
> failures.
>
> IMO, the breaking API changes of Approach 1 are even desirable because they
> will make users aware that this feature changed.
>
>
>
> What do you think?
>
>
>
> Cheers, Fabian
>