You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Eric Badger <eb...@verizonmedia.com.INVALID> on 2021/04/27 20:06:47 UTC

Java 8 Lambdas

Hello all,

I'd like to gauge the community on the usage of lambdas within Hadoop code.
I've been reviewing a lot of patches recently that either add or modify
lambdas and I'm beginning to think that sometimes we, as a community, are
writing lambdas because we can rather than because we should. To me, it
seems that lambdas often decrease the readability of the code, making it
more difficult to understand. I don't personally know a lot about the
performance of lambdas and welcome arguments on behalf of why lambdas
should be used. An additional argument is that lambdas aren't available in
Java 7, and branch-2.10 currently supports Java 7. So any code going back
to branch-2.10 has to be redone upon backporting. Anyway, my main point
here is to encourage us to rethink whether we should be using lambdas in
any given circumstance just because we can.

Eric

p.s. I'm also happy to accept this as my personal "old man yells at cloud"
issue if everyone else thinks lambdas are the greatest

Re: Java 8 Lambdas

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
LambdaTestUtils added l-expression based intercept() in HADOOP-13716 in
October 2016.
Five Years Ago. That was still java-7...we added it knowing what java 8
would bring.

There is no way we could go back on not using intercept() in tests.

Since then some other big l-expression stuff I've been involved in include
the org.apache.hadoop.fs.s3a.Invoker which lets you executre a remote
operation with conversion of AWS SDK exceptions into IOEs, and a retry
policy based off those IOEs.

    final String region = invoker.retry("getBucketLocation()", bucketName,
true,
        () -> s3.getBucketLocation(bucketName));

That was HADOOP-13786, S3A committers: 2017.  Four years ago. Which was
after branch-3 was java 8 only.

More recently, if you look closely, the whole
org.apache.hadoop.util.functional package is designed to give us basic
Functional Programming around IOE-raising code, including our remote
iterators, giving us a minimal *and tested* set of transformations we can
do with our code.

  public RemoteIterator<S3ALocatedFileStatus>
createLocatedFileStatusIterator(
      RemoteIterator<S3AFileStatus> statusIterator) {
    return RemoteIterators.mappingRemoteIterator(
        statusIterator,
        listingOperationCallbacks::toLocatedFileStatus);
  }

This ties in nicely with the duration tracking/IOStatistics code it came in
(HADOOP-17450), so I can evaluate an operation and collect min/mean/max
durations of operations, not just log but serialize into the task/job
summary files and so get some details on where the bottlenecks are in
talking to cloud services.

final RemoteIterator<FileStatus> listing =
    trackDuration(iostatistics, OP_DIRECTORY_SCAN, () ->
        operations.listStatusIterator(srcDir));


So I'm afraid that I will be carrying on using L-expressions, such as in
HADOOP-17511. But I don't expect any of the code there to be backportable
to Java 7(*)

At the same time, I'd like to know what the performance impact of us using
l-expressions is in terms of cost of allocations of closures, evaluation
etc. There's also the *little* detail that stack trace data doesn't get
preserved that well. Together that argues against gratuitous use of java
streams.


To summarise my PoV then

Java 8 lambda expressions are an incredible tool which can be used in
interesting and innovative ways. Adding retryability, stats gathering and
auditing of remote IO being the key ones I've been using it for, in the
Hadoop codebase, for 4-5 years.

I'm happy to let someone lay out a style guide on good/bad uses, a "no
gratuitous move to streams()" policy, and may be a designated "No Lambda's
here" bit of code. (UGI?)

But a discussion about whether to have them in the code at all? Not only
too late, I don't see how that can be justified.

-Steve

(*) . Having recently been backporting some ABFS code to a branch-3.1 fork,
Mockito version downgrading is enough of a blocker on test cases there
that the language version is a detail...you won't get that far.

Re: [E] Re: Java 8 Lambdas

Posted by Jim Brennan <ja...@verizonmedia.com.INVALID>.
I just think that we should be cognizant of changes (particularly bug
fixes), that will need to be ported to branch-2.10.  Since it is still on
Java7, anytime you use a lambda in code on trunk, we need to change it for
branch-2.10.   While not difficult, this is extra work and it increases the
differences between branches, which can also cause more conflicts when
porting bug fixes back.


On Wed, Apr 28, 2021 at 9:28 PM Ahmed Hussein <a...@ahussein.me> wrote:

> Thanks Eric for raising this issue!
>
> The debate about lambda is very complicated and won't be resolved any time
> soon.
>
>  I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
>
> No one probably knows :)
> - Lambda performance would depend on the JVM implementation. This changes
> between
> releases.
> - Java8+ features forces lambda. For example,
> ConcurrentHashMap.computeIfAbsent()
>
> I believe that we can transform this discussion into specific action items
> for future commits:
> For instance, a couple of those specifications would be:
> - No refactor just for the sake of using Lambda, unless there is a strong
> technical justification.
> - Usage of lambda in Unit-tests should be fine. If lambda makes the test
> more readable, and
>   allows passing method references, then this should make the unit-tests.
> - We put sample code in the "how-to-contribute" to elaborate "capturing Vs
> non-capturing"
>   lambda expressions and the implications of each type on the performance.
> - Without getting into much detail, IMHO, streams should be committed into
> the code
>   in exceptional cases. The possibility of executing code in parallel makes
> debugging
>   a nightmare. i.e., Usage of ForEach needs to be justified, what does it
> bring to the table?
>
> On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
> <eb...@verizonmedia.com.invalid> wrote:
>
> > Hello all,
> >
> > I'd like to gauge the community on the usage of lambdas within Hadoop
> code.
> > I've been reviewing a lot of patches recently that either add or modify
> > lambdas and I'm beginning to think that sometimes we, as a community, are
> > writing lambdas because we can rather than because we should. To me, it
> > seems that lambdas often decrease the readability of the code, making it
> > more difficult to understand. I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
> > should be used. An additional argument is that lambdas aren't available
> in
> > Java 7, and branch-2.10 currently supports Java 7. So any code going back
> > to branch-2.10 has to be redone upon backporting. Anyway, my main point
> > here is to encourage us to rethink whether we should be using lambdas in
> > any given circumstance just because we can.
> >
> > Eric
> >
> > p.s. I'm also happy to accept this as my personal "old man yells at
> cloud"
> > issue if everyone else thinks lambdas are the greatest
> >
>
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*
>

Re: Java 8 Lambdas

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
LambdaTestUtils added l-expression based intercept() in HADOOP-13716 in
October 2016.
Five Years Ago. That was still java-7...we added it knowing what java 8
would bring.

There is no way we could go back on not using intercept() in tests.

Since then some other big l-expression stuff I've been involved in include
the org.apache.hadoop.fs.s3a.Invoker which lets you executre a remote
operation with conversion of AWS SDK exceptions into IOEs, and a retry
policy based off those IOEs.

    final String region = invoker.retry("getBucketLocation()", bucketName,
true,
        () -> s3.getBucketLocation(bucketName));

That was HADOOP-13786, S3A committers: 2017.  Four years ago. Which was
after branch-3 was java 8 only.

More recently, if you look closely, the whole
org.apache.hadoop.util.functional package is designed to give us basic
Functional Programming around IOE-raising code, including our remote
iterators, giving us a minimal *and tested* set of transformations we can
do with our code.

  public RemoteIterator<S3ALocatedFileStatus>
createLocatedFileStatusIterator(
      RemoteIterator<S3AFileStatus> statusIterator) {
    return RemoteIterators.mappingRemoteIterator(
        statusIterator,
        listingOperationCallbacks::toLocatedFileStatus);
  }

This ties in nicely with the duration tracking/IOStatistics code it came in
(HADOOP-17450), so I can evaluate an operation and collect min/mean/max
durations of operations, not just log but serialize into the task/job
summary files and so get some details on where the bottlenecks are in
talking to cloud services.

final RemoteIterator<FileStatus> listing =
    trackDuration(iostatistics, OP_DIRECTORY_SCAN, () ->
        operations.listStatusIterator(srcDir));


So I'm afraid that I will be carrying on using L-expressions, such as in
HADOOP-17511. But I don't expect any of the code there to be backportable
to Java 7(*)

At the same time, I'd like to know what the performance impact of us using
l-expressions is in terms of cost of allocations of closures, evaluation
etc. There's also the *little* detail that stack trace data doesn't get
preserved that well. Together that argues against gratuitous use of java
streams.


To summarise my PoV then

Java 8 lambda expressions are an incredible tool which can be used in
interesting and innovative ways. Adding retryability, stats gathering and
auditing of remote IO being the key ones I've been using it for, in the
Hadoop codebase, for 4-5 years.

I'm happy to let someone lay out a style guide on good/bad uses, a "no
gratuitous move to streams()" policy, and may be a designated "No Lambda's
here" bit of code. (UGI?)

But a discussion about whether to have them in the code at all? Not only
too late, I don't see how that can be justified.

-Steve

(*) . Having recently been backporting some ABFS code to a branch-3.1 fork,
Mockito version downgrading is enough of a blocker on test cases there
that the language version is a detail...you won't get that far.

Re: [E] Re: Java 8 Lambdas

Posted by Jim Brennan <ja...@verizonmedia.com.INVALID>.
I just think that we should be cognizant of changes (particularly bug
fixes), that will need to be ported to branch-2.10.  Since it is still on
Java7, anytime you use a lambda in code on trunk, we need to change it for
branch-2.10.   While not difficult, this is extra work and it increases the
differences between branches, which can also cause more conflicts when
porting bug fixes back.


On Wed, Apr 28, 2021 at 9:28 PM Ahmed Hussein <a...@ahussein.me> wrote:

> Thanks Eric for raising this issue!
>
> The debate about lambda is very complicated and won't be resolved any time
> soon.
>
>  I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
>
> No one probably knows :)
> - Lambda performance would depend on the JVM implementation. This changes
> between
> releases.
> - Java8+ features forces lambda. For example,
> ConcurrentHashMap.computeIfAbsent()
>
> I believe that we can transform this discussion into specific action items
> for future commits:
> For instance, a couple of those specifications would be:
> - No refactor just for the sake of using Lambda, unless there is a strong
> technical justification.
> - Usage of lambda in Unit-tests should be fine. If lambda makes the test
> more readable, and
>   allows passing method references, then this should make the unit-tests.
> - We put sample code in the "how-to-contribute" to elaborate "capturing Vs
> non-capturing"
>   lambda expressions and the implications of each type on the performance.
> - Without getting into much detail, IMHO, streams should be committed into
> the code
>   in exceptional cases. The possibility of executing code in parallel makes
> debugging
>   a nightmare. i.e., Usage of ForEach needs to be justified, what does it
> bring to the table?
>
> On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
> <eb...@verizonmedia.com.invalid> wrote:
>
> > Hello all,
> >
> > I'd like to gauge the community on the usage of lambdas within Hadoop
> code.
> > I've been reviewing a lot of patches recently that either add or modify
> > lambdas and I'm beginning to think that sometimes we, as a community, are
> > writing lambdas because we can rather than because we should. To me, it
> > seems that lambdas often decrease the readability of the code, making it
> > more difficult to understand. I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
> > should be used. An additional argument is that lambdas aren't available
> in
> > Java 7, and branch-2.10 currently supports Java 7. So any code going back
> > to branch-2.10 has to be redone upon backporting. Anyway, my main point
> > here is to encourage us to rethink whether we should be using lambdas in
> > any given circumstance just because we can.
> >
> > Eric
> >
> > p.s. I'm also happy to accept this as my personal "old man yells at
> cloud"
> > issue if everyone else thinks lambdas are the greatest
> >
>
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*
>

Re: Java 8 Lambdas

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
LambdaTestUtils added l-expression based intercept() in HADOOP-13716 in
October 2016.
Five Years Ago. That was still java-7...we added it knowing what java 8
would bring.

There is no way we could go back on not using intercept() in tests.

Since then some other big l-expression stuff I've been involved in include
the org.apache.hadoop.fs.s3a.Invoker which lets you executre a remote
operation with conversion of AWS SDK exceptions into IOEs, and a retry
policy based off those IOEs.

    final String region = invoker.retry("getBucketLocation()", bucketName,
true,
        () -> s3.getBucketLocation(bucketName));

That was HADOOP-13786, S3A committers: 2017.  Four years ago. Which was
after branch-3 was java 8 only.

More recently, if you look closely, the whole
org.apache.hadoop.util.functional package is designed to give us basic
Functional Programming around IOE-raising code, including our remote
iterators, giving us a minimal *and tested* set of transformations we can
do with our code.

  public RemoteIterator<S3ALocatedFileStatus>
createLocatedFileStatusIterator(
      RemoteIterator<S3AFileStatus> statusIterator) {
    return RemoteIterators.mappingRemoteIterator(
        statusIterator,
        listingOperationCallbacks::toLocatedFileStatus);
  }

This ties in nicely with the duration tracking/IOStatistics code it came in
(HADOOP-17450), so I can evaluate an operation and collect min/mean/max
durations of operations, not just log but serialize into the task/job
summary files and so get some details on where the bottlenecks are in
talking to cloud services.

final RemoteIterator<FileStatus> listing =
    trackDuration(iostatistics, OP_DIRECTORY_SCAN, () ->
        operations.listStatusIterator(srcDir));


So I'm afraid that I will be carrying on using L-expressions, such as in
HADOOP-17511. But I don't expect any of the code there to be backportable
to Java 7(*)

At the same time, I'd like to know what the performance impact of us using
l-expressions is in terms of cost of allocations of closures, evaluation
etc. There's also the *little* detail that stack trace data doesn't get
preserved that well. Together that argues against gratuitous use of java
streams.


To summarise my PoV then

Java 8 lambda expressions are an incredible tool which can be used in
interesting and innovative ways. Adding retryability, stats gathering and
auditing of remote IO being the key ones I've been using it for, in the
Hadoop codebase, for 4-5 years.

I'm happy to let someone lay out a style guide on good/bad uses, a "no
gratuitous move to streams()" policy, and may be a designated "No Lambda's
here" bit of code. (UGI?)

But a discussion about whether to have them in the code at all? Not only
too late, I don't see how that can be justified.

-Steve

(*) . Having recently been backporting some ABFS code to a branch-3.1 fork,
Mockito version downgrading is enough of a blocker on test cases there
that the language version is a detail...you won't get that far.

Re: [E] Re: Java 8 Lambdas

Posted by Jim Brennan <ja...@verizonmedia.com.INVALID>.
I just think that we should be cognizant of changes (particularly bug
fixes), that will need to be ported to branch-2.10.  Since it is still on
Java7, anytime you use a lambda in code on trunk, we need to change it for
branch-2.10.   While not difficult, this is extra work and it increases the
differences between branches, which can also cause more conflicts when
porting bug fixes back.


On Wed, Apr 28, 2021 at 9:28 PM Ahmed Hussein <a...@ahussein.me> wrote:

> Thanks Eric for raising this issue!
>
> The debate about lambda is very complicated and won't be resolved any time
> soon.
>
>  I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
>
> No one probably knows :)
> - Lambda performance would depend on the JVM implementation. This changes
> between
> releases.
> - Java8+ features forces lambda. For example,
> ConcurrentHashMap.computeIfAbsent()
>
> I believe that we can transform this discussion into specific action items
> for future commits:
> For instance, a couple of those specifications would be:
> - No refactor just for the sake of using Lambda, unless there is a strong
> technical justification.
> - Usage of lambda in Unit-tests should be fine. If lambda makes the test
> more readable, and
>   allows passing method references, then this should make the unit-tests.
> - We put sample code in the "how-to-contribute" to elaborate "capturing Vs
> non-capturing"
>   lambda expressions and the implications of each type on the performance.
> - Without getting into much detail, IMHO, streams should be committed into
> the code
>   in exceptional cases. The possibility of executing code in parallel makes
> debugging
>   a nightmare. i.e., Usage of ForEach needs to be justified, what does it
> bring to the table?
>
> On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
> <eb...@verizonmedia.com.invalid> wrote:
>
> > Hello all,
> >
> > I'd like to gauge the community on the usage of lambdas within Hadoop
> code.
> > I've been reviewing a lot of patches recently that either add or modify
> > lambdas and I'm beginning to think that sometimes we, as a community, are
> > writing lambdas because we can rather than because we should. To me, it
> > seems that lambdas often decrease the readability of the code, making it
> > more difficult to understand. I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
> > should be used. An additional argument is that lambdas aren't available
> in
> > Java 7, and branch-2.10 currently supports Java 7. So any code going back
> > to branch-2.10 has to be redone upon backporting. Anyway, my main point
> > here is to encourage us to rethink whether we should be using lambdas in
> > any given circumstance just because we can.
> >
> > Eric
> >
> > p.s. I'm also happy to accept this as my personal "old man yells at
> cloud"
> > issue if everyone else thinks lambdas are the greatest
> >
>
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*
>

Re: Java 8 Lambdas

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
LambdaTestUtils added l-expression based intercept() in HADOOP-13716 in
October 2016.
Five Years Ago. That was still java-7...we added it knowing what java 8
would bring.

There is no way we could go back on not using intercept() in tests.

Since then some other big l-expression stuff I've been involved in include
the org.apache.hadoop.fs.s3a.Invoker which lets you executre a remote
operation with conversion of AWS SDK exceptions into IOEs, and a retry
policy based off those IOEs.

    final String region = invoker.retry("getBucketLocation()", bucketName,
true,
        () -> s3.getBucketLocation(bucketName));

That was HADOOP-13786, S3A committers: 2017.  Four years ago. Which was
after branch-3 was java 8 only.

More recently, if you look closely, the whole
org.apache.hadoop.util.functional package is designed to give us basic
Functional Programming around IOE-raising code, including our remote
iterators, giving us a minimal *and tested* set of transformations we can
do with our code.

  public RemoteIterator<S3ALocatedFileStatus>
createLocatedFileStatusIterator(
      RemoteIterator<S3AFileStatus> statusIterator) {
    return RemoteIterators.mappingRemoteIterator(
        statusIterator,
        listingOperationCallbacks::toLocatedFileStatus);
  }

This ties in nicely with the duration tracking/IOStatistics code it came in
(HADOOP-17450), so I can evaluate an operation and collect min/mean/max
durations of operations, not just log but serialize into the task/job
summary files and so get some details on where the bottlenecks are in
talking to cloud services.

final RemoteIterator<FileStatus> listing =
    trackDuration(iostatistics, OP_DIRECTORY_SCAN, () ->
        operations.listStatusIterator(srcDir));


So I'm afraid that I will be carrying on using L-expressions, such as in
HADOOP-17511. But I don't expect any of the code there to be backportable
to Java 7(*)

At the same time, I'd like to know what the performance impact of us using
l-expressions is in terms of cost of allocations of closures, evaluation
etc. There's also the *little* detail that stack trace data doesn't get
preserved that well. Together that argues against gratuitous use of java
streams.


To summarise my PoV then

Java 8 lambda expressions are an incredible tool which can be used in
interesting and innovative ways. Adding retryability, stats gathering and
auditing of remote IO being the key ones I've been using it for, in the
Hadoop codebase, for 4-5 years.

I'm happy to let someone lay out a style guide on good/bad uses, a "no
gratuitous move to streams()" policy, and may be a designated "No Lambda's
here" bit of code. (UGI?)

But a discussion about whether to have them in the code at all? Not only
too late, I don't see how that can be justified.

-Steve

(*) . Having recently been backporting some ABFS code to a branch-3.1 fork,
Mockito version downgrading is enough of a blocker on test cases there
that the language version is a detail...you won't get that far.

Re: [E] Re: Java 8 Lambdas

Posted by Jim Brennan <ja...@verizonmedia.com.INVALID>.
I just think that we should be cognizant of changes (particularly bug
fixes), that will need to be ported to branch-2.10.  Since it is still on
Java7, anytime you use a lambda in code on trunk, we need to change it for
branch-2.10.   While not difficult, this is extra work and it increases the
differences between branches, which can also cause more conflicts when
porting bug fixes back.


On Wed, Apr 28, 2021 at 9:28 PM Ahmed Hussein <a...@ahussein.me> wrote:

> Thanks Eric for raising this issue!
>
> The debate about lambda is very complicated and won't be resolved any time
> soon.
>
>  I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
>
> No one probably knows :)
> - Lambda performance would depend on the JVM implementation. This changes
> between
> releases.
> - Java8+ features forces lambda. For example,
> ConcurrentHashMap.computeIfAbsent()
>
> I believe that we can transform this discussion into specific action items
> for future commits:
> For instance, a couple of those specifications would be:
> - No refactor just for the sake of using Lambda, unless there is a strong
> technical justification.
> - Usage of lambda in Unit-tests should be fine. If lambda makes the test
> more readable, and
>   allows passing method references, then this should make the unit-tests.
> - We put sample code in the "how-to-contribute" to elaborate "capturing Vs
> non-capturing"
>   lambda expressions and the implications of each type on the performance.
> - Without getting into much detail, IMHO, streams should be committed into
> the code
>   in exceptional cases. The possibility of executing code in parallel makes
> debugging
>   a nightmare. i.e., Usage of ForEach needs to be justified, what does it
> bring to the table?
>
> On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
> <eb...@verizonmedia.com.invalid> wrote:
>
> > Hello all,
> >
> > I'd like to gauge the community on the usage of lambdas within Hadoop
> code.
> > I've been reviewing a lot of patches recently that either add or modify
> > lambdas and I'm beginning to think that sometimes we, as a community, are
> > writing lambdas because we can rather than because we should. To me, it
> > seems that lambdas often decrease the readability of the code, making it
> > more difficult to understand. I don't personally know a lot about the
> > performance of lambdas and welcome arguments on behalf of why lambdas
> > should be used. An additional argument is that lambdas aren't available
> in
> > Java 7, and branch-2.10 currently supports Java 7. So any code going back
> > to branch-2.10 has to be redone upon backporting. Anyway, my main point
> > here is to encourage us to rethink whether we should be using lambdas in
> > any given circumstance just because we can.
> >
> > Eric
> >
> > p.s. I'm also happy to accept this as my personal "old man yells at
> cloud"
> > issue if everyone else thinks lambdas are the greatest
> >
>
>
> --
> Best Regards,
>
> *Ahmed Hussein, PhD*
>

Re: Java 8 Lambdas

Posted by Ahmed Hussein <a...@ahussein.me>.
Thanks Eric for raising this issue!

The debate about lambda is very complicated and won't be resolved any time
soon.

 I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas

No one probably knows :)
- Lambda performance would depend on the JVM implementation. This changes
between
releases.
- Java8+ features forces lambda. For example,
ConcurrentHashMap.computeIfAbsent()

I believe that we can transform this discussion into specific action items
for future commits:
For instance, a couple of those specifications would be:
- No refactor just for the sake of using Lambda, unless there is a strong
technical justification.
- Usage of lambda in Unit-tests should be fine. If lambda makes the test
more readable, and
  allows passing method references, then this should make the unit-tests.
- We put sample code in the "how-to-contribute" to elaborate "capturing Vs
non-capturing"
  lambda expressions and the implications of each type on the performance.
- Without getting into much detail, IMHO, streams should be committed into
the code
  in exceptional cases. The possibility of executing code in parallel makes
debugging
  a nightmare. i.e., Usage of ForEach needs to be justified, what does it
bring to the table?

On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
<eb...@verizonmedia.com.invalid> wrote:

> Hello all,
>
> I'd like to gauge the community on the usage of lambdas within Hadoop code.
> I've been reviewing a lot of patches recently that either add or modify
> lambdas and I'm beginning to think that sometimes we, as a community, are
> writing lambdas because we can rather than because we should. To me, it
> seems that lambdas often decrease the readability of the code, making it
> more difficult to understand. I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas
> should be used. An additional argument is that lambdas aren't available in
> Java 7, and branch-2.10 currently supports Java 7. So any code going back
> to branch-2.10 has to be redone upon backporting. Anyway, my main point
> here is to encourage us to rethink whether we should be using lambdas in
> any given circumstance just because we can.
>
> Eric
>
> p.s. I'm also happy to accept this as my personal "old man yells at cloud"
> issue if everyone else thinks lambdas are the greatest
>


-- 
Best Regards,

*Ahmed Hussein, PhD*

Re: Java 8 Lambdas

Posted by Ahmed Hussein <a...@ahussein.me>.
Thanks Eric for raising this issue!

The debate about lambda is very complicated and won't be resolved any time
soon.

 I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas

No one probably knows :)
- Lambda performance would depend on the JVM implementation. This changes
between
releases.
- Java8+ features forces lambda. For example,
ConcurrentHashMap.computeIfAbsent()

I believe that we can transform this discussion into specific action items
for future commits:
For instance, a couple of those specifications would be:
- No refactor just for the sake of using Lambda, unless there is a strong
technical justification.
- Usage of lambda in Unit-tests should be fine. If lambda makes the test
more readable, and
  allows passing method references, then this should make the unit-tests.
- We put sample code in the "how-to-contribute" to elaborate "capturing Vs
non-capturing"
  lambda expressions and the implications of each type on the performance.
- Without getting into much detail, IMHO, streams should be committed into
the code
  in exceptional cases. The possibility of executing code in parallel makes
debugging
  a nightmare. i.e., Usage of ForEach needs to be justified, what does it
bring to the table?

On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
<eb...@verizonmedia.com.invalid> wrote:

> Hello all,
>
> I'd like to gauge the community on the usage of lambdas within Hadoop code.
> I've been reviewing a lot of patches recently that either add or modify
> lambdas and I'm beginning to think that sometimes we, as a community, are
> writing lambdas because we can rather than because we should. To me, it
> seems that lambdas often decrease the readability of the code, making it
> more difficult to understand. I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas
> should be used. An additional argument is that lambdas aren't available in
> Java 7, and branch-2.10 currently supports Java 7. So any code going back
> to branch-2.10 has to be redone upon backporting. Anyway, my main point
> here is to encourage us to rethink whether we should be using lambdas in
> any given circumstance just because we can.
>
> Eric
>
> p.s. I'm also happy to accept this as my personal "old man yells at cloud"
> issue if everyone else thinks lambdas are the greatest
>


-- 
Best Regards,

*Ahmed Hussein, PhD*

Re: Java 8 Lambdas

Posted by Ahmed Hussein <a...@ahussein.me>.
Thanks Eric for raising this issue!

The debate about lambda is very complicated and won't be resolved any time
soon.

 I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas

No one probably knows :)
- Lambda performance would depend on the JVM implementation. This changes
between
releases.
- Java8+ features forces lambda. For example,
ConcurrentHashMap.computeIfAbsent()

I believe that we can transform this discussion into specific action items
for future commits:
For instance, a couple of those specifications would be:
- No refactor just for the sake of using Lambda, unless there is a strong
technical justification.
- Usage of lambda in Unit-tests should be fine. If lambda makes the test
more readable, and
  allows passing method references, then this should make the unit-tests.
- We put sample code in the "how-to-contribute" to elaborate "capturing Vs
non-capturing"
  lambda expressions and the implications of each type on the performance.
- Without getting into much detail, IMHO, streams should be committed into
the code
  in exceptional cases. The possibility of executing code in parallel makes
debugging
  a nightmare. i.e., Usage of ForEach needs to be justified, what does it
bring to the table?

On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
<eb...@verizonmedia.com.invalid> wrote:

> Hello all,
>
> I'd like to gauge the community on the usage of lambdas within Hadoop code.
> I've been reviewing a lot of patches recently that either add or modify
> lambdas and I'm beginning to think that sometimes we, as a community, are
> writing lambdas because we can rather than because we should. To me, it
> seems that lambdas often decrease the readability of the code, making it
> more difficult to understand. I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas
> should be used. An additional argument is that lambdas aren't available in
> Java 7, and branch-2.10 currently supports Java 7. So any code going back
> to branch-2.10 has to be redone upon backporting. Anyway, my main point
> here is to encourage us to rethink whether we should be using lambdas in
> any given circumstance just because we can.
>
> Eric
>
> p.s. I'm also happy to accept this as my personal "old man yells at cloud"
> issue if everyone else thinks lambdas are the greatest
>


-- 
Best Regards,

*Ahmed Hussein, PhD*

Re: Java 8 Lambdas

Posted by Ahmed Hussein <a...@ahussein.me>.
Thanks Eric for raising this issue!

The debate about lambda is very complicated and won't be resolved any time
soon.

 I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas

No one probably knows :)
- Lambda performance would depend on the JVM implementation. This changes
between
releases.
- Java8+ features forces lambda. For example,
ConcurrentHashMap.computeIfAbsent()

I believe that we can transform this discussion into specific action items
for future commits:
For instance, a couple of those specifications would be:
- No refactor just for the sake of using Lambda, unless there is a strong
technical justification.
- Usage of lambda in Unit-tests should be fine. If lambda makes the test
more readable, and
  allows passing method references, then this should make the unit-tests.
- We put sample code in the "how-to-contribute" to elaborate "capturing Vs
non-capturing"
  lambda expressions and the implications of each type on the performance.
- Without getting into much detail, IMHO, streams should be committed into
the code
  in exceptional cases. The possibility of executing code in parallel makes
debugging
  a nightmare. i.e., Usage of ForEach needs to be justified, what does it
bring to the table?

On Tue, Apr 27, 2021 at 3:07 PM Eric Badger
<eb...@verizonmedia.com.invalid> wrote:

> Hello all,
>
> I'd like to gauge the community on the usage of lambdas within Hadoop code.
> I've been reviewing a lot of patches recently that either add or modify
> lambdas and I'm beginning to think that sometimes we, as a community, are
> writing lambdas because we can rather than because we should. To me, it
> seems that lambdas often decrease the readability of the code, making it
> more difficult to understand. I don't personally know a lot about the
> performance of lambdas and welcome arguments on behalf of why lambdas
> should be used. An additional argument is that lambdas aren't available in
> Java 7, and branch-2.10 currently supports Java 7. So any code going back
> to branch-2.10 has to be redone upon backporting. Anyway, my main point
> here is to encourage us to rethink whether we should be using lambdas in
> any given circumstance just because we can.
>
> Eric
>
> p.s. I'm also happy to accept this as my personal "old man yells at cloud"
> issue if everyone else thinks lambdas are the greatest
>


-- 
Best Regards,

*Ahmed Hussein, PhD*