You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Till Rohrmann <tr...@apache.org> on 2018/10/05 08:31:06 UTC

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Thanks Aljoscha for starting this discussion. The described problem brings
us indeed a bit into a pickle. Even with option 1) I think it is somewhat
API breaking because everyone who used lambdas without types needs to add
them now. Consequently, I only see two real options out of the ones you've
proposed:

1) Disambiguate the API (either by removing
reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out

Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
problematic because then all Scala API users who have implemented a
GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
think it will be problematic with RichGroupReduceFunction which you need to
get access to the RuntimeContext.

Maintaining two master branches puts a lot of burden onto the developers to
always keep the two branches in sync. Ideally I would like to avoid this.

I also played a little bit around with implicit conversions to add the
lambda methods in Scala 2.11 on demand, but I was not able to get it work
smoothly.

I'm cross posting this thread to user as well to get some more user
feedback.

Cheers,
Till

On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
wrote:

> The second alternative, with the addition of methods that take functions
> with Scala types, seems the most sensible.  I wonder if there is a need
> then to maintain the *J Java parameter methods, or whether users could just
> access the functionality by converting the Scala DataStreams to Java via
> .javaStream and whatever the equivalent is for DataSets.
>
> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
> wrote:
>
> > Hi,
> >
> > I'm currently working on
> https://issues.apache.org/jira/browse/FLINK-7811,
> > with the goal of adding support for Scala 2.12. There is a bit of a
> hurdle
> > and I have to explain some context first.
> >
> > With Scala 2.12, lambdas are implemented using the lambda mechanism of
> > Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
> > means that the following two method definitions can both take a lambda:
> >
> > def map[R](mapper: MapFunction[T, R]): DataSet[R]
> > def map[R](fun: T => R): DataSet[R]
> >
> > The Scala compiler gives precedence to the lambda version when you call
> > map() with a lambda in simple cases, so it works here. You could still
> call
> > map() with a lambda if the lambda version of the method weren't here
> > because they are now considered the same. For Scala 2.11 we need both
> > signatures, though, to allow calling with a lambda and with a
> MapFunction.
> >
> > The problem is with more complicated method signatures, like:
> >
> > def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
> > DataSet[R]
> >
> > def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
> >
> > (for reference, GroupReduceFunction is a SAM with void
> > reduce(java.lang.Iterable<T> values, Collector<O> out))
> >
> > These two signatures are not the same but similar enough for the Scala
> > 2.12 compiler to "get confused". In Scala 2.11, I could call
> reduceGroup()
> > with a lambda that doesn't have parameter type definitions and things
> would
> > be fine. With Scala 2.12 I can't do that because the compiler can't
> figure
> > out which method to call and requires explicit type definitions on the
> > lambda parameters.
> >
> > I see some solutions for this:
> >
> > 1. Keep the methods as is, this would force people to always explicitly
> > specify parameter types on their lambdas.
> >
> > 2. Rename the second method to reduceGroupJ() to signal that it takes a
> > user function that takes Java-style interfaces (the first parameter is
> > java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
> > disambiguates the code, users can use lambdas without specifying explicit
> > parameter types but breaks the API.
> >
> > One effect of 2. would be that we can add a reduceGroup() method that
> > takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
> > it would allow people to implement user functions without having to cast
> > the various Iterator/Iterable parameters.
> >
> > Either way, people would have to adapt their code when moving to Scala
> > 2.12 in some way, depending on what style of methods they use.
> >
> > There is also solution 2.5:
> >
> > 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
> > old method names for Scala 2.11. This would require some infrastructure
> and
> > I don't yet know how it can be done in a sane way.
> >
> > What do you think? I personally would be in favour of 2. but it breaks
> the
> > existing API.
> >
> > Best,
> > Aljoscha
> >
> >
> >
> >
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Aljoscha Krettek <al...@apache.org>.

Yes, but I think we would pretty much have to do that. I don't think we can stop doing 2.11 releases.

> On 8. Oct 2018, at 15:37, Chesnay Schepler <ch...@apache.org> wrote:
> 
> The infrastructure would only be required if we opt for releasing 2.11 and 2.12 builds simultaneously, correct?
> 
> On 08.10.2018 15:04, Aljoscha Krettek wrote:
>> Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.
>> 
>>> On 8. Oct 2018, at 13:00, Chesnay Schepler <ch...@apache.org> wrote:
>>> 
>>> And the remaining parts would only be about breaking the API?
>>> 
>>> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>>>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>>>> 
>>>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>>>> 
>>>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>>>>> wondering about the benefit, as the API break still has to happen at some point.
>>>>> 
>>>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>>>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>>>> 
>>>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>>>> API breaking because everyone who used lambdas without types needs to add
>>>>>> them now. Consequently, I only see two real options out of the ones you've
>>>>>> proposed:
>>>>>> 
>>>>>> 1) Disambiguate the API (either by removing
>>>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>>>> 
>>>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>>>> problematic because then all Scala API users who have implemented a
>>>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>>>> think it will be problematic with RichGroupReduceFunction which you need to
>>>>>> get access to the RuntimeContext.
>>>>>> 
>>>>>> Maintaining two master branches puts a lot of burden onto the developers to
>>>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>>>> 
>>>>>> I also played a little bit around with implicit conversions to add the
>>>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>>>> smoothly.
>>>>>> 
>>>>>> I'm cross posting this thread to user as well to get some more user
>>>>>> feedback.
>>>>>> 
>>>>>> Cheers,
>>>>>> Till
>>>>>> 
>>>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> The second alternative, with the addition of methods that take functions
>>>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>>>>> access the functionality by converting the Scala DataStreams to Java via
>>>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>>>> 
>>>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I'm currently working on
>>>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>>>>> hurdle
>>>>>>>> and I have to explain some context first.
>>>>>>>> 
>>>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>>>>> means that the following two method definitions can both take a lambda:
>>>>>>>> 
>>>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>>>> 
>>>>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>>>>> call
>>>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>>>> MapFunction.
>>>>>>>> The problem is with more complicated method signatures, like:
>>>>>>>> 
>>>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>>>> DataSet[R]
>>>>>>>> 
>>>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>>>> 
>>>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>>>> 
>>>>>>>> These two signatures are not the same but similar enough for the Scala
>>>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>>>> reduceGroup()
>>>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>>>> would
>>>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>>>> figure
>>>>>>>> out which method to call and requires explicit type definitions on the
>>>>>>>> lambda parameters.
>>>>>>>> 
>>>>>>>> I see some solutions for this:
>>>>>>>> 
>>>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>>>> specify parameter types on their lambdas.
>>>>>>>> 
>>>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>>>>> parameter types but breaks the API.
>>>>>>>> 
>>>>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>>>>> it would allow people to implement user functions without having to cast
>>>>>>>> the various Iterator/Iterable parameters.
>>>>>>>> 
>>>>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>>>> 
>>>>>>>> There is also solution 2.5:
>>>>>>>> 
>>>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>>>> and
>>>>>>>> I don't yet know how it can be done in a sane way.
>>>>>>>> 
>>>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>>>> the
>>>>>>>> existing API.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Aljoscha
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> 
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Aljoscha Krettek <al...@apache.org>.

Yes, but I think we would pretty much have to do that. I don't think we can stop doing 2.11 releases.

> On 8. Oct 2018, at 15:37, Chesnay Schepler <ch...@apache.org> wrote:
> 
> The infrastructure would only be required if we opt for releasing 2.11 and 2.12 builds simultaneously, correct?
> 
> On 08.10.2018 15:04, Aljoscha Krettek wrote:
>> Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.
>> 
>>> On 8. Oct 2018, at 13:00, Chesnay Schepler <ch...@apache.org> wrote:
>>> 
>>> And the remaining parts would only be about breaking the API?
>>> 
>>> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>>>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>>>> 
>>>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>>>> 
>>>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>>>>> wondering about the benefit, as the API break still has to happen at some point.
>>>>> 
>>>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>>>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>>>> 
>>>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>>>> API breaking because everyone who used lambdas without types needs to add
>>>>>> them now. Consequently, I only see two real options out of the ones you've
>>>>>> proposed:
>>>>>> 
>>>>>> 1) Disambiguate the API (either by removing
>>>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>>>> 
>>>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>>>> problematic because then all Scala API users who have implemented a
>>>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>>>> think it will be problematic with RichGroupReduceFunction which you need to
>>>>>> get access to the RuntimeContext.
>>>>>> 
>>>>>> Maintaining two master branches puts a lot of burden onto the developers to
>>>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>>>> 
>>>>>> I also played a little bit around with implicit conversions to add the
>>>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>>>> smoothly.
>>>>>> 
>>>>>> I'm cross posting this thread to user as well to get some more user
>>>>>> feedback.
>>>>>> 
>>>>>> Cheers,
>>>>>> Till
>>>>>> 
>>>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> The second alternative, with the addition of methods that take functions
>>>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>>>>> access the functionality by converting the Scala DataStreams to Java via
>>>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>>>> 
>>>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I'm currently working on
>>>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>>>>> hurdle
>>>>>>>> and I have to explain some context first.
>>>>>>>> 
>>>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>>>>> means that the following two method definitions can both take a lambda:
>>>>>>>> 
>>>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>>>> 
>>>>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>>>>> call
>>>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>>>> MapFunction.
>>>>>>>> The problem is with more complicated method signatures, like:
>>>>>>>> 
>>>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>>>> DataSet[R]
>>>>>>>> 
>>>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>>>> 
>>>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>>>> 
>>>>>>>> These two signatures are not the same but similar enough for the Scala
>>>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>>>> reduceGroup()
>>>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>>>> would
>>>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>>>> figure
>>>>>>>> out which method to call and requires explicit type definitions on the
>>>>>>>> lambda parameters.
>>>>>>>> 
>>>>>>>> I see some solutions for this:
>>>>>>>> 
>>>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>>>> specify parameter types on their lambdas.
>>>>>>>> 
>>>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>>>>> parameter types but breaks the API.
>>>>>>>> 
>>>>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>>>>> it would allow people to implement user functions without having to cast
>>>>>>>> the various Iterator/Iterable parameters.
>>>>>>>> 
>>>>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>>>> 
>>>>>>>> There is also solution 2.5:
>>>>>>>> 
>>>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>>>> and
>>>>>>>> I don't yet know how it can be done in a sane way.
>>>>>>>> 
>>>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>>>> the
>>>>>>>> existing API.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Aljoscha
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> 
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Chesnay Schepler <ch...@apache.org>.

The infrastructure would only be required if we opt for releasing 2.11 
and 2.12 builds simultaneously, correct?

On 08.10.2018 15:04, Aljoscha Krettek wrote:
> Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.
>
>> On 8. Oct 2018, at 13:00, Chesnay Schepler <ch...@apache.org> wrote:
>>
>> And the remaining parts would only be about breaking the API?
>>
>> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>>>
>>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>>>
>>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>>>> wondering about the benefit, as the API break still has to happen at some point.
>>>>
>>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>>>
>>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>>> API breaking because everyone who used lambdas without types needs to add
>>>>> them now. Consequently, I only see two real options out of the ones you've
>>>>> proposed:
>>>>>
>>>>> 1) Disambiguate the API (either by removing
>>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>>>
>>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>>> problematic because then all Scala API users who have implemented a
>>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>>> think it will be problematic with RichGroupReduceFunction which you need to
>>>>> get access to the RuntimeContext.
>>>>>
>>>>> Maintaining two master branches puts a lot of burden onto the developers to
>>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>>>
>>>>> I also played a little bit around with implicit conversions to add the
>>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>>> smoothly.
>>>>>
>>>>> I'm cross posting this thread to user as well to get some more user
>>>>> feedback.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The second alternative, with the addition of methods that take functions
>>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>>>> access the functionality by converting the Scala DataStreams to Java via
>>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>>>
>>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm currently working on
>>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>>>> hurdle
>>>>>>> and I have to explain some context first.
>>>>>>>
>>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>>>> means that the following two method definitions can both take a lambda:
>>>>>>>
>>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>>>
>>>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>>>> call
>>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>>> MapFunction.
>>>>>>> The problem is with more complicated method signatures, like:
>>>>>>>
>>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>>> DataSet[R]
>>>>>>>
>>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>>>
>>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>>>
>>>>>>> These two signatures are not the same but similar enough for the Scala
>>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>>> reduceGroup()
>>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>>> would
>>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>>> figure
>>>>>>> out which method to call and requires explicit type definitions on the
>>>>>>> lambda parameters.
>>>>>>>
>>>>>>> I see some solutions for this:
>>>>>>>
>>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>>> specify parameter types on their lambdas.
>>>>>>>
>>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>>>> parameter types but breaks the API.
>>>>>>>
>>>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>>>> it would allow people to implement user functions without having to cast
>>>>>>> the various Iterator/Iterable parameters.
>>>>>>>
>>>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>>>
>>>>>>> There is also solution 2.5:
>>>>>>>
>>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>>> and
>>>>>>> I don't yet know how it can be done in a sane way.
>>>>>>>
>>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>>> the
>>>>>>> existing API.
>>>>>>>
>>>>>>> Best,
>>>>>>> Aljoscha
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Chesnay Schepler <ch...@apache.org>.

The infrastructure would only be required if we opt for releasing 2.11 
and 2.12 builds simultaneously, correct?

On 08.10.2018 15:04, Aljoscha Krettek wrote:
> Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.
>
>> On 8. Oct 2018, at 13:00, Chesnay Schepler <ch...@apache.org> wrote:
>>
>> And the remaining parts would only be about breaking the API?
>>
>> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>>>
>>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>>>
>>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>>>> wondering about the benefit, as the API break still has to happen at some point.
>>>>
>>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>>>
>>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>>> API breaking because everyone who used lambdas without types needs to add
>>>>> them now. Consequently, I only see two real options out of the ones you've
>>>>> proposed:
>>>>>
>>>>> 1) Disambiguate the API (either by removing
>>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>>>
>>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>>> problematic because then all Scala API users who have implemented a
>>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>>> think it will be problematic with RichGroupReduceFunction which you need to
>>>>> get access to the RuntimeContext.
>>>>>
>>>>> Maintaining two master branches puts a lot of burden onto the developers to
>>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>>>
>>>>> I also played a little bit around with implicit conversions to add the
>>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>>> smoothly.
>>>>>
>>>>> I'm cross posting this thread to user as well to get some more user
>>>>> feedback.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> The second alternative, with the addition of methods that take functions
>>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>>>> access the functionality by converting the Scala DataStreams to Java via
>>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>>>
>>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm currently working on
>>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>>>> hurdle
>>>>>>> and I have to explain some context first.
>>>>>>>
>>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>>>> means that the following two method definitions can both take a lambda:
>>>>>>>
>>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>>>
>>>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>>>> call
>>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>>> MapFunction.
>>>>>>> The problem is with more complicated method signatures, like:
>>>>>>>
>>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>>> DataSet[R]
>>>>>>>
>>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>>>
>>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>>>
>>>>>>> These two signatures are not the same but similar enough for the Scala
>>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>>> reduceGroup()
>>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>>> would
>>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>>> figure
>>>>>>> out which method to call and requires explicit type definitions on the
>>>>>>> lambda parameters.
>>>>>>>
>>>>>>> I see some solutions for this:
>>>>>>>
>>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>>> specify parameter types on their lambdas.
>>>>>>>
>>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>>>> parameter types but breaks the API.
>>>>>>>
>>>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>>>> it would allow people to implement user functions without having to cast
>>>>>>> the various Iterator/Iterable parameters.
>>>>>>>
>>>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>>>
>>>>>>> There is also solution 2.5:
>>>>>>>
>>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>>> and
>>>>>>> I don't yet know how it can be done in a sane way.
>>>>>>>
>>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>>> the
>>>>>>> existing API.
>>>>>>>
>>>>>>> Best,
>>>>>>> Aljoscha
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Aljoscha Krettek <al...@apache.org>.

Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.

> On 8. Oct 2018, at 13:00, Chesnay Schepler <ch...@apache.org> wrote:
> 
> And the remaining parts would only be about breaking the API?
> 
> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>> 
>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>> 
>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>>> wondering about the benefit, as the API break still has to happen at some point.
>>> 
>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>> 
>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>> API breaking because everyone who used lambdas without types needs to add
>>>> them now. Consequently, I only see two real options out of the ones you've
>>>> proposed:
>>>> 
>>>> 1) Disambiguate the API (either by removing
>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>> 
>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>> problematic because then all Scala API users who have implemented a
>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>> think it will be problematic with RichGroupReduceFunction which you need to
>>>> get access to the RuntimeContext.
>>>> 
>>>> Maintaining two master branches puts a lot of burden onto the developers to
>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>> 
>>>> I also played a little bit around with implicit conversions to add the
>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>> smoothly.
>>>> 
>>>> I'm cross posting this thread to user as well to get some more user
>>>> feedback.
>>>> 
>>>> Cheers,
>>>> Till
>>>> 
>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>>> wrote:
>>>> 
>>>>> The second alternative, with the addition of methods that take functions
>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>>> access the functionality by converting the Scala DataStreams to Java via
>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>> 
>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm currently working on
>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>>> hurdle
>>>>>> and I have to explain some context first.
>>>>>> 
>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>>> means that the following two method definitions can both take a lambda:
>>>>>> 
>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>> 
>>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>>> call
>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>> MapFunction.
>>>>>> The problem is with more complicated method signatures, like:
>>>>>> 
>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>> DataSet[R]
>>>>>> 
>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>> 
>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>> 
>>>>>> These two signatures are not the same but similar enough for the Scala
>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>> reduceGroup()
>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>> would
>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>> figure
>>>>>> out which method to call and requires explicit type definitions on the
>>>>>> lambda parameters.
>>>>>> 
>>>>>> I see some solutions for this:
>>>>>> 
>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>> specify parameter types on their lambdas.
>>>>>> 
>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>>> parameter types but breaks the API.
>>>>>> 
>>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>>> it would allow people to implement user functions without having to cast
>>>>>> the various Iterator/Iterable parameters.
>>>>>> 
>>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>> 
>>>>>> There is also solution 2.5:
>>>>>> 
>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>> and
>>>>>> I don't yet know how it can be done in a sane way.
>>>>>> 
>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>> the
>>>>>> existing API.
>>>>>> 
>>>>>> Best,
>>>>>> Aljoscha
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>> 
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Aljoscha Krettek <al...@apache.org>.

Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.

> On 8. Oct 2018, at 13:00, Chesnay Schepler <ch...@apache.org> wrote:
> 
> And the remaining parts would only be about breaking the API?
> 
> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>> 
>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>> 
>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>>> wondering about the benefit, as the API break still has to happen at some point.
>>> 
>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>> 
>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>> API breaking because everyone who used lambdas without types needs to add
>>>> them now. Consequently, I only see two real options out of the ones you've
>>>> proposed:
>>>> 
>>>> 1) Disambiguate the API (either by removing
>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>> 
>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>> problematic because then all Scala API users who have implemented a
>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>> think it will be problematic with RichGroupReduceFunction which you need to
>>>> get access to the RuntimeContext.
>>>> 
>>>> Maintaining two master branches puts a lot of burden onto the developers to
>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>> 
>>>> I also played a little bit around with implicit conversions to add the
>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>> smoothly.
>>>> 
>>>> I'm cross posting this thread to user as well to get some more user
>>>> feedback.
>>>> 
>>>> Cheers,
>>>> Till
>>>> 
>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>>> wrote:
>>>> 
>>>>> The second alternative, with the addition of methods that take functions
>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>>> access the functionality by converting the Scala DataStreams to Java via
>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>> 
>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm currently working on
>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>>> hurdle
>>>>>> and I have to explain some context first.
>>>>>> 
>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>>> means that the following two method definitions can both take a lambda:
>>>>>> 
>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>> 
>>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>>> call
>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>> MapFunction.
>>>>>> The problem is with more complicated method signatures, like:
>>>>>> 
>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>> DataSet[R]
>>>>>> 
>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>> 
>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>> 
>>>>>> These two signatures are not the same but similar enough for the Scala
>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>> reduceGroup()
>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>> would
>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>> figure
>>>>>> out which method to call and requires explicit type definitions on the
>>>>>> lambda parameters.
>>>>>> 
>>>>>> I see some solutions for this:
>>>>>> 
>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>> specify parameter types on their lambdas.
>>>>>> 
>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>>> parameter types but breaks the API.
>>>>>> 
>>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>>> it would allow people to implement user functions without having to cast
>>>>>> the various Iterator/Iterable parameters.
>>>>>> 
>>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>> 
>>>>>> There is also solution 2.5:
>>>>>> 
>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>> and
>>>>>> I don't yet know how it can be done in a sane way.
>>>>>> 
>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>> the
>>>>>> existing API.
>>>>>> 
>>>>>> Best,
>>>>>> Aljoscha
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>> 
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Chesnay Schepler <ch...@apache.org>.

And the remaining parts would only be about breaking the API?

On 08.10.2018 12:24, Aljoscha Krettek wrote:
> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>
>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>
>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>> wondering about the benefit, as the API break still has to happen at some point.
>>
>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>
>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>> API breaking because everyone who used lambdas without types needs to add
>>> them now. Consequently, I only see two real options out of the ones you've
>>> proposed:
>>>
>>> 1) Disambiguate the API (either by removing
>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>
>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>> problematic because then all Scala API users who have implemented a
>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>> think it will be problematic with RichGroupReduceFunction which you need to
>>> get access to the RuntimeContext.
>>>
>>> Maintaining two master branches puts a lot of burden onto the developers to
>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>
>>> I also played a little bit around with implicit conversions to add the
>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>> smoothly.
>>>
>>> I'm cross posting this thread to user as well to get some more user
>>> feedback.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>> wrote:
>>>
>>>> The second alternative, with the addition of methods that take functions
>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>> access the functionality by converting the Scala DataStreams to Java via
>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>
>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm currently working on
>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>> hurdle
>>>>> and I have to explain some context first.
>>>>>
>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>> means that the following two method definitions can both take a lambda:
>>>>>
>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>
>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>> call
>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>> signatures, though, to allow calling with a lambda and with a
>>>> MapFunction.
>>>>> The problem is with more complicated method signatures, like:
>>>>>
>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>> DataSet[R]
>>>>>
>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>
>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>
>>>>> These two signatures are not the same but similar enough for the Scala
>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>> reduceGroup()
>>>>> with a lambda that doesn't have parameter type definitions and things
>>>> would
>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>> figure
>>>>> out which method to call and requires explicit type definitions on the
>>>>> lambda parameters.
>>>>>
>>>>> I see some solutions for this:
>>>>>
>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>> specify parameter types on their lambdas.
>>>>>
>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>> parameter types but breaks the API.
>>>>>
>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>> it would allow people to implement user functions without having to cast
>>>>> the various Iterator/Iterable parameters.
>>>>>
>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>
>>>>> There is also solution 2.5:
>>>>>
>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>> and
>>>>> I don't yet know how it can be done in a sane way.
>>>>>
>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>> the
>>>>> existing API.
>>>>>
>>>>> Best,
>>>>> Aljoscha
>>>>>
>>>>>
>>>>>
>>>>>
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Chesnay Schepler <ch...@apache.org>.

And the remaining parts would only be about breaking the API?

On 08.10.2018 12:24, Aljoscha Krettek wrote:
> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
>
>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
>>
>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
>> wondering about the benefit, as the API break still has to happen at some point.
>>
>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>
>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>> Thanks Aljoscha for starting this discussion. The described problem brings
>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>> API breaking because everyone who used lambdas without types needs to add
>>> them now. Consequently, I only see two real options out of the ones you've
>>> proposed:
>>>
>>> 1) Disambiguate the API (either by removing
>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>>>
>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>> problematic because then all Scala API users who have implemented a
>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>> think it will be problematic with RichGroupReduceFunction which you need to
>>> get access to the RuntimeContext.
>>>
>>> Maintaining two master branches puts a lot of burden onto the developers to
>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>
>>> I also played a little bit around with implicit conversions to add the
>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>> smoothly.
>>>
>>> I'm cross posting this thread to user as well to get some more user
>>> feedback.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>>> wrote:
>>>
>>>> The second alternative, with the addition of methods that take functions
>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>> then to maintain the *J Java parameter methods, or whether users could just
>>>> access the functionality by converting the Scala DataStreams to Java via
>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>
>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm currently working on
>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>> hurdle
>>>>> and I have to explain some context first.
>>>>>
>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>> means that the following two method definitions can both take a lambda:
>>>>>
>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>
>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>> call
>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>> signatures, though, to allow calling with a lambda and with a
>>>> MapFunction.
>>>>> The problem is with more complicated method signatures, like:
>>>>>
>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>> DataSet[R]
>>>>>
>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>
>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>
>>>>> These two signatures are not the same but similar enough for the Scala
>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>> reduceGroup()
>>>>> with a lambda that doesn't have parameter type definitions and things
>>>> would
>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>> figure
>>>>> out which method to call and requires explicit type definitions on the
>>>>> lambda parameters.
>>>>>
>>>>> I see some solutions for this:
>>>>>
>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>> specify parameter types on their lambdas.
>>>>>
>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>>> parameter types but breaks the API.
>>>>>
>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>>> it would allow people to implement user functions without having to cast
>>>>> the various Iterator/Iterable parameters.
>>>>>
>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>
>>>>> There is also solution 2.5:
>>>>>
>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>> and
>>>>> I don't yet know how it can be done in a sane way.
>>>>>
>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>> the
>>>>> existing API.
>>>>>
>>>>> Best,
>>>>> Aljoscha
>>>>>
>>>>>
>>>>>
>>>>>
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Aljoscha Krettek <al...@apache.org>.

I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784

> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
> 
> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
> wondering about the benefit, as the API break still has to happen at some point.
> 
> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
> If this is the only blocker I suggest to make the breaking change in 1.8.
> 
> On 05.10.2018 10:31, Till Rohrmann wrote:
>> Thanks Aljoscha for starting this discussion. The described problem brings
>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>> API breaking because everyone who used lambdas without types needs to add
>> them now. Consequently, I only see two real options out of the ones you've
>> proposed:
>> 
>> 1) Disambiguate the API (either by removing
>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>> 
>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>> problematic because then all Scala API users who have implemented a
>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>> think it will be problematic with RichGroupReduceFunction which you need to
>> get access to the RuntimeContext.
>> 
>> Maintaining two master branches puts a lot of burden onto the developers to
>> always keep the two branches in sync. Ideally I would like to avoid this.
>> 
>> I also played a little bit around with implicit conversions to add the
>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>> smoothly.
>> 
>> I'm cross posting this thread to user as well to get some more user
>> feedback.
>> 
>> Cheers,
>> Till
>> 
>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>> wrote:
>> 
>>> The second alternative, with the addition of methods that take functions
>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>> then to maintain the *J Java parameter methods, or whether users could just
>>> access the functionality by converting the Scala DataStreams to Java via
>>> .javaStream and whatever the equivalent is for DataSets.
>>> 
>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm currently working on
>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>> hurdle
>>>> and I have to explain some context first.
>>>> 
>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>> means that the following two method definitions can both take a lambda:
>>>> 
>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>> def map[R](fun: T => R): DataSet[R]
>>>> 
>>>> The Scala compiler gives precedence to the lambda version when you call
>>>> map() with a lambda in simple cases, so it works here. You could still
>>> call
>>>> map() with a lambda if the lambda version of the method weren't here
>>>> because they are now considered the same. For Scala 2.11 we need both
>>>> signatures, though, to allow calling with a lambda and with a
>>> MapFunction.
>>>> The problem is with more complicated method signatures, like:
>>>> 
>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>> DataSet[R]
>>>> 
>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>> 
>>>> (for reference, GroupReduceFunction is a SAM with void
>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>> 
>>>> These two signatures are not the same but similar enough for the Scala
>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>> reduceGroup()
>>>> with a lambda that doesn't have parameter type definitions and things
>>> would
>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>> figure
>>>> out which method to call and requires explicit type definitions on the
>>>> lambda parameters.
>>>> 
>>>> I see some solutions for this:
>>>> 
>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>> specify parameter types on their lambdas.
>>>> 
>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>> user function that takes Java-style interfaces (the first parameter is
>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>> parameter types but breaks the API.
>>>> 
>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>> it would allow people to implement user functions without having to cast
>>>> the various Iterator/Iterable parameters.
>>>> 
>>>> Either way, people would have to adapt their code when moving to Scala
>>>> 2.12 in some way, depending on what style of methods they use.
>>>> 
>>>> There is also solution 2.5:
>>>> 
>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>> old method names for Scala 2.11. This would require some infrastructure
>>> and
>>>> I don't yet know how it can be done in a sane way.
>>>> 
>>>> What do you think? I personally would be in favour of 2. but it breaks
>>> the
>>>> existing API.
>>>> 
>>>> Best,
>>>> Aljoscha
>>>> 
>>>> 
>>>> 
>>>> 
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Aljoscha Krettek <al...@apache.org>.

I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784

> On 8. Oct 2018, at 09:56, Chesnay Schepler <ch...@apache.org> wrote:
> 
> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm
> wondering about the benefit, as the API break still has to happen at some point.
> 
> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API?
> If this is the only blocker I suggest to make the breaking change in 1.8.
> 
> On 05.10.2018 10:31, Till Rohrmann wrote:
>> Thanks Aljoscha for starting this discussion. The described problem brings
>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>> API breaking because everyone who used lambdas without types needs to add
>> them now. Consequently, I only see two real options out of the ones you've
>> proposed:
>> 
>> 1) Disambiguate the API (either by removing
>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>> 
>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>> problematic because then all Scala API users who have implemented a
>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>> think it will be problematic with RichGroupReduceFunction which you need to
>> get access to the RuntimeContext.
>> 
>> Maintaining two master branches puts a lot of burden onto the developers to
>> always keep the two branches in sync. Ideally I would like to avoid this.
>> 
>> I also played a little bit around with implicit conversions to add the
>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>> smoothly.
>> 
>> I'm cross posting this thread to user as well to get some more user
>> feedback.
>> 
>> Cheers,
>> Till
>> 
>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
>> wrote:
>> 
>>> The second alternative, with the addition of methods that take functions
>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>> then to maintain the *J Java parameter methods, or whether users could just
>>> access the functionality by converting the Scala DataStreams to Java via
>>> .javaStream and whatever the equivalent is for DataSets.
>>> 
>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I'm currently working on
>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>> hurdle
>>>> and I have to explain some context first.
>>>> 
>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>> means that the following two method definitions can both take a lambda:
>>>> 
>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>> def map[R](fun: T => R): DataSet[R]
>>>> 
>>>> The Scala compiler gives precedence to the lambda version when you call
>>>> map() with a lambda in simple cases, so it works here. You could still
>>> call
>>>> map() with a lambda if the lambda version of the method weren't here
>>>> because they are now considered the same. For Scala 2.11 we need both
>>>> signatures, though, to allow calling with a lambda and with a
>>> MapFunction.
>>>> The problem is with more complicated method signatures, like:
>>>> 
>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>> DataSet[R]
>>>> 
>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>> 
>>>> (for reference, GroupReduceFunction is a SAM with void
>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>> 
>>>> These two signatures are not the same but similar enough for the Scala
>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>> reduceGroup()
>>>> with a lambda that doesn't have parameter type definitions and things
>>> would
>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>> figure
>>>> out which method to call and requires explicit type definitions on the
>>>> lambda parameters.
>>>> 
>>>> I see some solutions for this:
>>>> 
>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>> specify parameter types on their lambdas.
>>>> 
>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>> user function that takes Java-style interfaces (the first parameter is
>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>> disambiguates the code, users can use lambdas without specifying explicit
>>>> parameter types but breaks the API.
>>>> 
>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>>> it would allow people to implement user functions without having to cast
>>>> the various Iterator/Iterable parameters.
>>>> 
>>>> Either way, people would have to adapt their code when moving to Scala
>>>> 2.12 in some way, depending on what style of methods they use.
>>>> 
>>>> There is also solution 2.5:
>>>> 
>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>>> old method names for Scala 2.11. This would require some infrastructure
>>> and
>>>> I don't yet know how it can be done in a sane way.
>>>> 
>>>> What do you think? I personally would be in favour of 2. but it breaks
>>> the
>>>> existing API.
>>>> 
>>>> Best,
>>>> Aljoscha
>>>> 
>>>> 
>>>> 
>>>> 
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Chesnay Schepler <ch...@apache.org>.

I'd rather not maintain 2 master branches. Beyond the maintenance 
overhead I'm
wondering about the benefit, as the API break still has to happen at 
some point.

@Aljoscha how much work for supporting scala 2.12 can be merged without 
breaking the API?
If this is the only blocker I suggest to make the breaking change in 1.8.

On 05.10.2018 10:31, Till Rohrmann wrote:
> Thanks Aljoscha for starting this discussion. The described problem brings
> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
> API breaking because everyone who used lambdas without types needs to add
> them now. Consequently, I only see two real options out of the ones you've
> proposed:
>
> 1) Disambiguate the API (either by removing
> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>
> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
> problematic because then all Scala API users who have implemented a
> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
> think it will be problematic with RichGroupReduceFunction which you need to
> get access to the RuntimeContext.
>
> Maintaining two master branches puts a lot of burden onto the developers to
> always keep the two branches in sync. Ideally I would like to avoid this.
>
> I also played a little bit around with implicit conversions to add the
> lambda methods in Scala 2.11 on demand, but I was not able to get it work
> smoothly.
>
> I'm cross posting this thread to user as well to get some more user
> feedback.
>
> Cheers,
> Till
>
> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
> wrote:
>
>> The second alternative, with the addition of methods that take functions
>> with Scala types, seems the most sensible.  I wonder if there is a need
>> then to maintain the *J Java parameter methods, or whether users could just
>> access the functionality by converting the Scala DataStreams to Java via
>> .javaStream and whatever the equivalent is for DataSets.
>>
>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently working on
>> https://issues.apache.org/jira/browse/FLINK-7811,
>>> with the goal of adding support for Scala 2.12. There is a bit of a
>> hurdle
>>> and I have to explain some context first.
>>>
>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>> means that the following two method definitions can both take a lambda:
>>>
>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>> def map[R](fun: T => R): DataSet[R]
>>>
>>> The Scala compiler gives precedence to the lambda version when you call
>>> map() with a lambda in simple cases, so it works here. You could still
>> call
>>> map() with a lambda if the lambda version of the method weren't here
>>> because they are now considered the same. For Scala 2.11 we need both
>>> signatures, though, to allow calling with a lambda and with a
>> MapFunction.
>>> The problem is with more complicated method signatures, like:
>>>
>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>> DataSet[R]
>>>
>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>
>>> (for reference, GroupReduceFunction is a SAM with void
>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>
>>> These two signatures are not the same but similar enough for the Scala
>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>> reduceGroup()
>>> with a lambda that doesn't have parameter type definitions and things
>> would
>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>> figure
>>> out which method to call and requires explicit type definitions on the
>>> lambda parameters.
>>>
>>> I see some solutions for this:
>>>
>>> 1. Keep the methods as is, this would force people to always explicitly
>>> specify parameter types on their lambdas.
>>>
>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>> user function that takes Java-style interfaces (the first parameter is
>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>> disambiguates the code, users can use lambdas without specifying explicit
>>> parameter types but breaks the API.
>>>
>>> One effect of 2. would be that we can add a reduceGroup() method that
>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>> it would allow people to implement user functions without having to cast
>>> the various Iterator/Iterable parameters.
>>>
>>> Either way, people would have to adapt their code when moving to Scala
>>> 2.12 in some way, depending on what style of methods they use.
>>>
>>> There is also solution 2.5:
>>>
>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>> old method names for Scala 2.11. This would require some infrastructure
>> and
>>> I don't yet know how it can be done in a sane way.
>>>
>>> What do you think? I personally would be in favour of 2. but it breaks
>> the
>>> existing API.
>>>
>>> Best,
>>> Aljoscha
>>>
>>>
>>>
>>>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Posted by Chesnay Schepler <ch...@apache.org>.

I'd rather not maintain 2 master branches. Beyond the maintenance 
overhead I'm
wondering about the benefit, as the API break still has to happen at 
some point.

@Aljoscha how much work for supporting scala 2.12 can be merged without 
breaking the API?
If this is the only blocker I suggest to make the breaking change in 1.8.

On 05.10.2018 10:31, Till Rohrmann wrote:
> Thanks Aljoscha for starting this discussion. The described problem brings
> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
> API breaking because everyone who used lambdas without types needs to add
> them now. Consequently, I only see two real options out of the ones you've
> proposed:
>
> 1) Disambiguate the API (either by removing
> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out
>
> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
> problematic because then all Scala API users who have implemented a
> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
> think it will be problematic with RichGroupReduceFunction which you need to
> get access to the RuntimeContext.
>
> Maintaining two master branches puts a lot of burden onto the developers to
> always keep the two branches in sync. Ideally I would like to avoid this.
>
> I also played a little bit around with implicit conversions to add the
> lambda methods in Scala 2.11 on demand, but I was not able to get it work
> smoothly.
>
> I'm cross posting this thread to user as well to get some more user
> feedback.
>
> Cheers,
> Till
>
> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fe...@gmail.com>
> wrote:
>
>> The second alternative, with the addition of methods that take functions
>> with Scala types, seems the most sensible.  I wonder if there is a need
>> then to maintain the *J Java parameter methods, or whether users could just
>> access the functionality by converting the Scala DataStreams to Java via
>> .javaStream and whatever the equivalent is for DataSets.
>>
>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <al...@apache.org>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm currently working on
>> https://issues.apache.org/jira/browse/FLINK-7811,
>>> with the goal of adding support for Scala 2.12. There is a bit of a
>> hurdle
>>> and I have to explain some context first.
>>>
>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>> means that the following two method definitions can both take a lambda:
>>>
>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>> def map[R](fun: T => R): DataSet[R]
>>>
>>> The Scala compiler gives precedence to the lambda version when you call
>>> map() with a lambda in simple cases, so it works here. You could still
>> call
>>> map() with a lambda if the lambda version of the method weren't here
>>> because they are now considered the same. For Scala 2.11 we need both
>>> signatures, though, to allow calling with a lambda and with a
>> MapFunction.
>>> The problem is with more complicated method signatures, like:
>>>
>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>> DataSet[R]
>>>
>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>
>>> (for reference, GroupReduceFunction is a SAM with void
>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>
>>> These two signatures are not the same but similar enough for the Scala
>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>> reduceGroup()
>>> with a lambda that doesn't have parameter type definitions and things
>> would
>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>> figure
>>> out which method to call and requires explicit type definitions on the
>>> lambda parameters.
>>>
>>> I see some solutions for this:
>>>
>>> 1. Keep the methods as is, this would force people to always explicitly
>>> specify parameter types on their lambdas.
>>>
>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>> user function that takes Java-style interfaces (the first parameter is
>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>> disambiguates the code, users can use lambdas without specifying explicit
>>> parameter types but breaks the API.
>>>
>>> One effect of 2. would be that we can add a reduceGroup() method that
>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus
>>> it would allow people to implement user functions without having to cast
>>> the various Iterator/Iterable parameters.
>>>
>>> Either way, people would have to adapt their code when moving to Scala
>>> 2.12 in some way, depending on what style of methods they use.
>>>
>>> There is also solution 2.5:
>>>
>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the
>>> old method names for Scala 2.11. This would require some infrastructure
>> and
>>> I don't yet know how it can be done in a sane way.
>>>
>>> What do you think? I personally would be in favour of 2. but it breaks
>> the
>>> existing API.
>>>
>>> Best,
>>> Aljoscha
>>>
>>>
>>>
>>>