You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Prabhu Joseph <pr...@gmail.com> on 2022/11/23 16:44:06 UTC

MapReduce Terasort job is slow on Java 11

Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
instance type, Hadoop version and Resources are the same in both the runs.
Have compared App logs of both good and bad runs and observed Avg Task
(both Map and Reduce) time is slower in Java 11.

*Java 8 : **7 min 2 secs *

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
-Dmapred.reduce.tasks=120
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
2022-11-23 12:29:59,520 INFO terasort.TeraSort: done

*Java 11 : 9 min 37 secs *

[hadoop@ip-172-31-60-208 ~]$ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
-Dmapred.reduce.tasks=120
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
2022-11-23 12:32:21,791 INFO terasort.TeraSort: done

Thanks,
Prabhu Joseph

Re: MapReduce Terasort job is slow on Java 11

Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Wei-Chiu

We are running all the processes on JDK11 and not just the MR job.

Thanks,
Ashutosh

On Wed, Nov 23, 2022 at 4:59 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
> MR job that is on JDK11?
>
> We have users running JDK in production. The NN GC performance was the only
> thing we were aware of.
> In the past we noticed because JDK11 uses G1GC by default, large NameNode
> performance was worse than JDK8 which uses CMS.
>
>
> On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
> wrote:
>
> > Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> > compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> > instance type, Hadoop version and Resources are the same in both the
> runs.
> > Have compared App logs of both good and bad runs and observed Avg Task
> > (both Map and Reduce) time is slower in Java 11.
> >
> > *Java 8 : **7 min 2 secs *
> >
> > hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> > 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
> >
> > *Java 11 : 9 min 37 secs *
> >
> > [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> > 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
> >
> > Thanks,
> > Prabhu Joseph
> >
>

Re: MapReduce Terasort job is slow on Java 11

Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Wei-Chiu

We are running all the processes on JDK11 and not just the MR job.

Thanks,
Ashutosh

On Wed, Nov 23, 2022 at 4:59 PM Wei-Chiu Chuang <we...@apache.org> wrote:

> For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
> MR job that is on JDK11?
>
> We have users running JDK in production. The NN GC performance was the only
> thing we were aware of.
> In the past we noticed because JDK11 uses G1GC by default, large NameNode
> performance was worse than JDK8 which uses CMS.
>
>
> On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
> wrote:
>
> > Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> > compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> > instance type, Hadoop version and Resources are the same in both the
> runs.
> > Have compared App logs of both good and bad runs and observed Avg Task
> > (both Map and Reduce) time is slower in Java 11.
> >
> > *Java 8 : **7 min 2 secs *
> >
> > hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> > 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
> >
> > *Java 11 : 9 min 37 secs *
> >
> > [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> > 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
> >
> > Thanks,
> > Prabhu Joseph
> >
>

Re: MapReduce Terasort job is slow on Java 11

Posted by Wei-Chiu Chuang <we...@apache.org>.
For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
MR job that is on JDK11?

We have users running JDK in production. The NN GC performance was the only
thing we were aware of.
In the past we noticed because JDK11 uses G1GC by default, large NameNode
performance was worse than JDK8 which uses CMS.


On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>

Re: MapReduce Terasort job is slow on Java 11

Posted by Wei-Chiu Chuang <we...@apache.org>.
For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
MR job that is on JDK11?

We have users running JDK in production. The NN GC performance was the only
thing we were aware of.
In the past we noticed because JDK11 uses G1GC by default, large NameNode
performance was worse than JDK8 which uses CMS.


On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>

Re: MapReduce Terasort job is slow on Java 11

Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Prabhu

Thanks for bringing this to attention. I can see it as part of this JIRA -
HADOOP-15338 <https://issues.apache.org/jira/browse/HADOOP-15338>. It was
also pointed out in the past as well about the degradation by @Ayush and
@zhenhe. We should continue this discussion.

@Ayush, did we get to know the root cause by any chance?

Thanks & Regards,
Ashutosh

On Wed, Nov 23, 2022 at 4:44 PM Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>

Re: MapReduce Terasort job is slow on Java 11

Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Prabhu

Thanks for bringing this to attention. I can see it as part of this JIRA -
HADOOP-15338 <https://issues.apache.org/jira/browse/HADOOP-15338>. It was
also pointed out in the past as well about the degradation by @Ayush and
@zhenhe. We should continue this discussion.

@Ayush, did we get to know the root cause by any chance?

Thanks & Regards,
Ashutosh

On Wed, Nov 23, 2022 at 4:44 PM Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>