You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Prabhu Joseph <pr...@gmail.com> on 2022/11/23 16:44:06 UTC
MapReduce Terasort job is slow on Java 11
Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
instance type, Hadoop version and Resources are the same in both the runs.
Have compared App logs of both good and bad runs and observed Avg Task
(both Map and Reduce) time is slower in Java 11.
*Java 8 : **7 min 2 secs *
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
-Dmapred.reduce.tasks=120
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
*Java 11 : 9 min 37 secs *
[hadoop@ip-172-31-60-208 ~]$ hadoop jar
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
-Dmapred.reduce.tasks=120
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
/tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
Thanks,
Prabhu Joseph
Re: MapReduce Terasort job is slow on Java 11
Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Wei-Chiu
We are running all the processes on JDK11 and not just the MR job.
Thanks,
Ashutosh
On Wed, Nov 23, 2022 at 4:59 PM Wei-Chiu Chuang <we...@apache.org> wrote:
> For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
> MR job that is on JDK11?
>
> We have users running JDK in production. The NN GC performance was the only
> thing we were aware of.
> In the past we noticed because JDK11 uses G1GC by default, large NameNode
> performance was worse than JDK8 which uses CMS.
>
>
> On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
> wrote:
>
> > Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> > compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> > instance type, Hadoop version and Resources are the same in both the
> runs.
> > Have compared App logs of both good and bad runs and observed Avg Task
> > (both Map and Reduce) time is slower in Java 11.
> >
> > *Java 8 : **7 min 2 secs *
> >
> > hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> > 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
> >
> > *Java 11 : 9 min 37 secs *
> >
> > [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> > 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
> >
> > Thanks,
> > Prabhu Joseph
> >
>
Re: MapReduce Terasort job is slow on Java 11
Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Wei-Chiu
We are running all the processes on JDK11 and not just the MR job.
Thanks,
Ashutosh
On Wed, Nov 23, 2022 at 4:59 PM Wei-Chiu Chuang <we...@apache.org> wrote:
> For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
> MR job that is on JDK11?
>
> We have users running JDK in production. The NN GC performance was the only
> thing we were aware of.
> In the past we noticed because JDK11 uses G1GC by default, large NameNode
> performance was worse than JDK8 which uses CMS.
>
>
> On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
> wrote:
>
> > Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> > compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> > instance type, Hadoop version and Resources are the same in both the
> runs.
> > Have compared App logs of both good and bad runs and observed Avg Task
> > (both Map and Reduce) time is slower in Java 11.
> >
> > *Java 8 : **7 min 2 secs *
> >
> > hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> > 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
> >
> > *Java 11 : 9 min 37 secs *
> >
> > [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> > -Dmapred.reduce.tasks=120
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> > /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> > 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> > 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
> >
> > Thanks,
> > Prabhu Joseph
> >
>
Re: MapReduce Terasort job is slow on Java 11
Posted by Wei-Chiu Chuang <we...@apache.org>.
For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
MR job that is on JDK11?
We have users running JDK in production. The NN GC performance was the only
thing we were aware of.
In the past we noticed because JDK11 uses G1GC by default, large NameNode
performance was worse than JDK8 which uses CMS.
On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
wrote:
> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>
Re: MapReduce Terasort job is slow on Java 11
Posted by Wei-Chiu Chuang <we...@apache.org>.
For the JDK11 case, does everyone on the cluster run on JDK11? or is it the
MR job that is on JDK11?
We have users running JDK in production. The NN GC performance was the only
thing we were aware of.
In the past we noticed because JDK11 uses G1GC by default, large NameNode
performance was worse than JDK8 which uses CMS.
On Wed, Nov 23, 2022 at 8:44 AM Prabhu Joseph <pr...@gmail.com>
wrote:
> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>
Re: MapReduce Terasort job is slow on Java 11
Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Prabhu
Thanks for bringing this to attention. I can see it as part of this JIRA -
HADOOP-15338 <https://issues.apache.org/jira/browse/HADOOP-15338>. It was
also pointed out in the past as well about the degradation by @Ayush and
@zhenhe. We should continue this discussion.
@Ayush, did we get to know the root cause by any chance?
Thanks & Regards,
Ashutosh
On Wed, Nov 23, 2022 at 4:44 PM Prabhu Joseph <pr...@gmail.com>
wrote:
> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>
Re: MapReduce Terasort job is slow on Java 11
Posted by Ashutosh Gupta <as...@gmail.com>.
Hi Prabhu
Thanks for bringing this to attention. I can see it as part of this JIRA -
HADOOP-15338 <https://issues.apache.org/jira/browse/HADOOP-15338>. It was
also pointed out in the past as well about the degradation by @Ayush and
@zhenhe. We should continue this discussion.
@Ayush, did we get to know the root cause by any chance?
Thanks & Regards,
Ashutosh
On Wed, Nov 23, 2022 at 4:44 PM Prabhu Joseph <pr...@gmail.com>
wrote:
> Hi, Any pointers on why the MapReduce Terasort job is slower on Java 11
> compared with Java 8. Input data, Configs, Number of Worker Nodes, Node
> instance type, Hadoop version and Resources are the same in both the runs.
> Have compared App logs of both good and bad runs and observed Avg Task
> (both Map and Reduce) time is slower in Java 11.
>
> *Java 8 : **7 min 2 secs *
>
> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:41,948 INFO terasort.TeraSort: starting
> 2022-11-23 12:29:59,520 INFO terasort.TeraSort: done
>
> *Java 11 : 9 min 37 secs *
>
> [hadoop@ip-172-31-60-208 ~]$ hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort
> -Dmapred.reduce.tasks=120
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/input/
> /tmp/terasort/127130b1-ceb0-422c-a957-48c651b20f30/output/
> 2022-11-23 12:22:44,167 INFO terasort.TeraSort: starting
> 2022-11-23 12:32:21,791 INFO terasort.TeraSort: done
>
> Thanks,
> Prabhu Joseph
>